The Linux Foundation Projects
Skip to main content

sparklyr joins LF AI as its Newest Incubation Project – Scaling data science and machine learning workflows using Apache Spark and R

By January 29, 2020No Comments

The LF AI Foundation (LF AI), the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), today is announcing sparklyr as its newest Incubation Project. sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

sparklyr makes using and extending Apache Spark more accessible by providing access to core functionality like installing, connecting and managing Spark and using Spark’s MLlib, Spark Structured Streaming, and Spark Pipelines from R. sparklyr supports connecting to local and remote Apache Spark clusters, provides an interface to Spark’s built-in machine learning algorithms, supports for executing custom R code across Spark clusters, and building multiple extensions to use Spark from R like H2O, XGBoost, GraphFrames, MLeap and more.

“We are extremely pleased to welcome sparklyr to LF AI. Connecting key tools and functionality is critical to growing the open source AI ecosystem, as well as supporting them into long term sustainability under a neutral, vendor-free, and open governance,” said Dr. Ibrahim Haddad, Executive Director of LF AI. “We look forward to fostering new collaborations and possible integrations with existing LF AI projects, such as enabling support for Horovod in R with sparklyr and enabling support to export models and pipelines with ONNX.”

LF AI provides a wide range of services to projects, and the first step is starting as an Incubation Project. Full details on why you should host your open source AI project with us are available here.

“We are very excited to have sparklyr join LF AI and to renew our ongoing commitment to the project. Hosting sparklyr with LF AI within the Linux Foundation enables more organizations to contribute to sparklyr, which benefits the R community by bringing additional talent, ideas, and shared components from other Linux Foundation projects like Delta Lake, Horovod, ONNX, and so on.” said Javier Luraschi, Software Engineer at RStudio, Inc. “Joining LF AI is a big step forward for sparklyr and a great time for you to join this community, contribute, and help sparklyr grow in 2020 and beyond!

sparklyr supports a complete backend for dplyr, a popular tool for working with data frame objects both in memory and out of memory. This enables you to use dplyr code to analyze large datasets in Apache Spark without having to rewrite R code.

For more information on getting involved immediately with sparklyr, please see the following resources.

LF AI Resources


  • Andrew Bringaze

    Andrew Bringaze is the senior developer for The Linux Foundation. With over 10 years of experience his focus is on open source code, WordPress, React, and site security.