The Linux Foundation Projects
Skip to main content

Discover LF AI & Data Projects with TAC Talks Watch Now

LF AI & Data Blog

Sparklyr 1.4.0 Release Now Available!

By October 6, 2020No Comments

Sparklyr, an LF AI Foundation Incubation Project, has released version 1.4.0! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

In version 1.4.0, sparklyr adds a variety of improvements. Highlights include:

  • This release features efficient and parallelizable weighted sampling methods for Spark data frames. The technique being utilized by those methods is known as “exponential variates” (see https://blogs.rstudio.com/ai/posts/2020-07-29-parallelized-sampling)
  • Tidyr verbs such as `pivot_wider`, `pivot_longer`, `nest`, `unnest`, `separate`, `unite`, and `fill` now have specialized implementations in `sparklyr` for working with Spark data frames
  • Support for newly introduced higher-order functions in Spark 3.0 (e.g., `array_sort`, `map_filter`, `map_zip_with`, and many others)
  • Dplyr-related improvements:
    • All higher-order functions and sampling methods are made directly accessible through `dplyr` verbs
    • Made `grepl` part of the `dplyr` interface for Spark data frames
    • Thanks to a pull request from @wkdavis, `dplyr::inner_join`, `dplyr::left_join`, `dplyr::right_join`, and  `dplyr::full_join` now correctly replace `’.’` with `’_’` in the `suffix` parameter when working with Spark data frames (see https://github.com/sparklyr/sparklyr/issues/2648)
  • Support for newly introduced functionalities in Spark 3.0
    • Thanks to a pull request from @zero323, the `RobustScaler` functionality in Spark 3.0 is now supported in sparklyr through `ft_robust_scaler`
    • RAPIDS GPU acceleration plugin can now be enabled with `spark_connect(…, package = “rapids”)` and configured with `spark_config` options prefixed with “spark.rapids.”

The power of open source projects is the aggregate contributions originating from different community members and organizations that collectively help drive the advancement of the projects and their roadmaps. The sparklyr community is a great example of this process and was instrumental in producing this release. The sparklyr team wanted to give a special THANK YOU to the following community members for their contributions via pull requests (listed in chronological order):

To learn more about the sparklyr 1.4.0 release, check out the full release notes. Want to get involved with sparklyr? Be sure to join the sparklyr-Announce and sparklyr Technical-Discuss mailing lists to join the community and stay connected on the latest updates.

Congratulations to the sparklyr team and we look forward to continued growth and success as part of the LF AI Foundation! To learn about hosting an open source project with us, visit the LF AI Foundation website.

sparklyr Key Links

LF AI Resources

Author

  • Andrew Bringaze

    Andrew Bringaze is the senior developer for The Linux Foundation. With over 10 years of experience his focus is on open source code, WordPress, React, and site security.

    View all posts