The Linux Foundation Projects
Skip to main content

Substra Joins LF AI & Data as New Incubation Project

By July 14, 2021No Comments

LF AI & Data Foundation—the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI) and data open source projects, today is announcing Substra as its latest Incubation Project. 

Substra is a framework offering distributed orchestration of machine learning tasks among partners while guaranteeing secure and trustless traceability of all operations. The Substra project was released and open sourced by OWKIN under the Apache-2.0 license. 

Substra enables privacy-preserving federated learning projects, where multiple parties collaborate on a Machine Learning objective while each one keeps their private datasets behind their own firewall. Its ambition is to make new scientific and economic data science collaborations possible.

Data scientists using the Substra framework are able to:

  • Use their own ML algorithm with any Python ML framework
  • Ship their algorithm on remote data for training and/or prediction and monitor their performances
  • Build advanced Federated Learning strategies for learning across several remote datasets

Data controllers using the Substra framework are able to:

  • Make their dataset(s) available to other partners for training/evaluation, ensuring it cannot be viewed or downloaded
  • Choose fine tuned permissions for your dataset to control its lifecycle
  • Monitor how the data was usedEngage in advanced multi-partner data science collaborations, even with partners owning competing datasets.

Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “We’re excited to welcome the Substra project in LF AI & Data. The project enables data scientists to use their own ML algorithm with any Python framework, deploy their algorithm on remote data for training and/or prediction and monitor their performances, and build advanced Federated Learning strategies for learning across several remote datasets. We look forward to working with the community to grow the project’s footprint and to create new collaboration opportunities for it with our members and other hosted projects.” 

Substra operates distributed Machine Learning and aims to provide tools for traceable Data Science.

  • Data Locality: Data remains in the owner’s data stores and is never transferred. AI models travel from one dataset to another.
  • Decentralized Trust: All operations are orchestrated by a distributed ledger technology. There is no need for a single trusted actor or third party; security arises from the network.
  • Traceability: An immutable audit trail registers all the operations realized on the platform simplifying certification of model.
  • Modularity: Substra is highly flexible; various permission regimes and workflow structures can be enforced corresponding to every specific use case.

Camille Marini, Founder of the Substra project, said: “On behalf of all people who contributed to the Substra framework, I am thrilled and proud that it has been accepted as an incubation project in the LF AI & Data Foundation. Substra has been designed to enable the collaboration / cooperation around the creation of ML models from distributed sources of sensitive data. Indeed, we believe that making discoveries using ML cannot be done without making sure that data privacy and governance are not compromised. We also believe that collaboration between data owners and data scientists is key to be able to create good ML models. These values are shared with the Linux Foundation AI and Data, which thus appears as the perfect host for the Substra project. We hope that it will bring value in the AI & Data community.”

Eric Boniface, General Manager of Substra Foundation, said: “We are very happy and proud at Substra Foundation to see the Substra project becoming an LF AI & Data hosted project. Having been its first umbrella for the open source community, hosting the repositories, elaborating the documentation, animating community workgroups and contributing to first real-world flagship use cases like the HealthChain and MELLODDY projects was an incredible experience shared with the amazing Owkin team developing the framework. It was only a first step at a moderate scale, and we are convinced that joining an experienced and global foundation like the LF AI & Data as an incubation project is a great opportunity and the perfect next chapter for the Substra project, its community, and many more privacy-preserving federated learning use cases to come!”.

LF AI & Data supports projects via a wide range of services, and the first step is joining as an Incubation Project.  LF AI & Data will support the neutral open governance for Substra to help foster the growth of the project. Learn more about Substra on their GitHub and be sure to join the Substra-Announce and Substra-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to Substra! We look forward to the project’s continued growth and success as part of the LF AI & Data Foundation. To learn about how to host an open source project with us, visit the LF AI & Data website.

Substra Key Links

LF AI & Data Resources


  • Andrew Bringaze

    Andrew Bringaze is the senior developer for The Linux Foundation. With over 10 years of experience his focus is on open source code, WordPress, React, and site security.

    View all posts