The Linux Foundation Projects
Skip to main content

 

LF AI & Data Foundation—the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI) and data open source projects, today announced LakeSoul as its latest Sandbox Project.

Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “We are thrilled to announce that LakeSoul, a cloud-native Lakehouse framework, is joining the LF AI & Data Foundation in incubation. This marks a significant milestone for our ecosystem as we welcome LakeSoul and its impressive array of cutting-edge features, including scalable metadata management, ACID transactions, and unified processing. By embracing an elastic architecture and optimizing cloud storage utilization, LakeSoul brings tremendous value to the open source community. We believe that by joining forces with the LF AI & Data Foundation, LakeSoul will receive the necessary guidance and opportunities to flourish. Together, we will actively contribute to the advancement of open source technologies and pave the way for future innovations in the field of AI and data.”  

LakeSoul, developed by DMetaSoul, is a cloud-native Lakehouse framework with a wide range of powerful features and applications. It offers an end-to-end real-time LakeHouse, ensuring uninterrupted data flow through its incremental streaming pipeline, eliminating the need for additional scheduling. With its unlimited storage capacity, users can effortlessly access and update historical data while leveraging the LakeHouse for business intelligence and advanced analytics. LakeSoul goes even further by enabling the real-time generation of machine learning datasets for classification, forecasting, and recommendations. Its seamless integration with renowned machine learning frameworks like MLLib, PyTorch, and Flink ML allows for direct data ingestion, facilitating efficient AI model development and analysis.

Key Features of LakeSoul:

  • High Concurrent ACID Transactions: LakeSoul ensures write consistency and high concurrency through a two-phase commit protocol and automatic commit conflict handling using the meta transaction mechanism.
  • Efficient Upsert Operations: LakeSoul supports range and hash partitions, enabling flexible update operations at row and column levels. It also supports native multi-stream concatenation with the same primary keys.
  • High Performance: LakeSoul leverages Rust and Arrow for asynchronous file reading and writing, including upsert and merge on read. It has made significant performance optimizations in storage layers such as object storage and HDFS (Hadoop Distributed File System), giving it a performance advantage over similar frameworks.
  • Real-time Data Warehousing: LakeSoul facilitates stream or batch reading and writing of tables in the lakehouse using SQL and Python. It supports automatic data synchronization from message queues and online databases through CDC collectors. LakeSoul tables can be read as incremental streams compatible with Flink Changlog Stream format for incremental compute.
  • BI & AI Support: LakeSoul provides native IO layer implementation and interfaces in C, Java, Python, and other languages, making it easy to connect with various big data computing and AI frameworks.
  • Collaboration with LF AI & Data: LakeSoul actively collaborates with LF AI & Data projects to build a better open-source AI and data infrastructure ecosystem.

Zhu Yadong, CEO of DMetaSoul, said: “We are honored to be able to join the LF AI & Data Foundation and donate the LakeSoul project. We hope that more users and developers could join the LF AI & Data community and build a better open source lakehouse framework under the open governance of the foundation.”

LF AI & Data supports projects via a wide range of services, and the first step is joining as a Sandbox Project. Learn more about LakeSoul on their GitHub and join the LakeSoul-Announce Mailing List

A warm welcome to LakeSoul! We are excited to see the project’s continued growth and success as part of the LF AI & Data Foundation. If you are interested in hosting an open source project with us, please visit the LF AI & Data website to learn more.

LakeSoul Key Links

LF AI & Data Resources

Access other resources on LF AI & Data’s GitHub or Wiki

Author