By: Animesh Singh, Christian Kadner, Tommy Chaoping Li

Three pillars of AI lifecycle: Datasets, Models and Pipelines

In the AI lifecycle, we use data to build models for decision automation. Datasets, Models, and Pipelines (which take us from raw Datasets to deployed Models) become the three most critical pillars of the AI lifecycle. Due to the large number of steps that need to be worked on in the Data and AI lifecycle, the process of building a model can be bifurcated amongst various teams and large amounts of duplication can arise when creating similar Datasets, Features, Models, Pipelines, Pipeline tasks, etc. This also poses a strong challenge for traceability, governance, risk management, lineage tracking, and metadata collection.

Announcing Machine Learning eXchange (MLX)

To solve the problems mentioned above, we need a central repository where all the different asset types like Datasets, Models, and Pipelines are stored to be shared and reused across organizational boundaries. Having opinionated and tested Datasets, Models, and Pipelines with high quality checks, proper licenses, and lineage tracking increases the speed and efficiency of the AI lifecycle tremendously.

To solve the above challenges, IBM and Linux Foundation AI and Data(LFAI and Data) are joining hands to announce Machine Learning eXchange (MLX), a Data and AI Asset Catalog and Execution Engine, in Open Source and Open Governance.

Machine Learning eXchange (MLX) allows upload, registration, execution, and deployment of:

AI pipelines and pipeline components
Models
Datasets
Notebooks

MLX Architecture

MLX provides:

Automated sample pipeline code generation to execute registered models, datasets, and notebooks
Pipelines engine powered by Kubeflow Pipelines on Tekton, the core of Watson Studio Pipelines
Registry for Kubeflow Pipeline Components
Dataset management by Datashim
Serving engine by KFServing

MLX Katalog Assets

Pipelines

In machine learning, it is common to run a sequence of tasks to process and learn from data, all of which can be packaged into a pipeline.

ML Pipelines are:

A consistent way for collaborating on data science projects across team and organization boundaries
A collection of coarse grained tasks encapsulated as pipeline components to be snapped together like lego bricks
A one-stop shop for people interested in training, validating, deploying, and monitoring AI models

Some sample Pipelines included in the MLX catalog:

Pipeline Components

A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data acquisition, data preprocessing, data transformation, model training, and so on. A component is a block of code performing an atomic task and can be written in any programming language and using any framework.

Some sample pipeline components included in the MLX catalog:

Models

MLX provides a collection of free, open source, state-of-the-art deep learning models for common application domains. The curated list includes deployable models that can be run as a microservice on Kubernetes or OpenShift and trainable models where users can provide their own data to train the models.

Some sample models included in the MLX catalog:

Datasets

The MLX catalog contains reusable datasets and leverages Datashim to make the datasets available to other MLX assets like notebooks, models, and pipelines in the form of Kubernetes volumes.

Sample datasets contained in the MLX catalog include:

Notebooks

Jupyter notebook is an open-source web application that allows data scientists to create and share documents that Jupyter notebook is an open-source web application that allows data scientists to create and share documents that contain runnable code, equations, visualizations, and narrative text. MLX can run Jupyter notebooks as self-contained pipeline components by leveraging the Elyra-AI project.

Sample notebooks contained in the MLX catalog include:

Join us to build cloud-native AI Marketplace on Kubernetes

The Machine Learning Exchange provides a marketplace and platform for data scientists to share, run, and collaborate on their assets. You now can use it to host and collaborate on Data and AI assets within your team and across teams. Please join us on the Machine Learning eXchange github repo, try it out, give feedback, and raise issues. Additionally, you can connect with us via the following:

To contribute and build end to end Machine Learning Pipelines on OpenShift and Kubernetes, please join the Kubeflow Pipelines on Tekton project and reach out with any questions, comments, and feedback!
To deploy Machine Learning Models in production, check out the KFServing project.

MLX Key Links

Thank You

Thanks to the many contributors of Machine Learning Exchange, mainly

Andrew Butler
Animesh Singh
Christian Kadner
Ibrahim Haddad
Karthik Muthuraman
Patrick Titzler
Romeo Kienzler
Saishruthi Swaminathan
Srishti Pithadia
Tommy Chaopling Li
Yihong Wang

LF AI & Data Resources

Learn about membership opportunities
Explore the interactive landscape
Check out our technical projects
Join us at upcoming events
Read the latest announcements on the blog
Subscribe to the mailing lists
Follow us on Twitter or LinkedIn
Access other resources on LF AI & Data’s GitHub or Wiki

Author

Andrew Bringaze

Andrew Bringaze is the senior developer for The Linux Foundation. With over 10 years of experience his focus is on open source code, WordPress, React, and site security.
View all posts

Machine Learning eXchange (MLX): One stop shop for Trusted Data and AI artifacts