Acumos
Acumos AI

Open source framework to build, share and deploy AI applications

Acumos is an open source platform, which supports design, integration and deployment of AI models. Furthermore, it offers an AI marketplace that empowers data scientists to publish adaptive AI models, while shielding them from the need to custom develop fully integrated solutions.

Learn More

Adlik
Adlik

Open source toolkit for accelerating deep learning inference

Adlik is an end-to-end optimizing framework for deep learning models. The goal of Adlik is to accelerate deep learning inference process both on cloud and embedded environments.

Learn More


AI Explainability 360

Open source toolkit that can help users better understand the ways that machine learning models predict labels

AI Explainability 360 is an open source toolkit that can help users better understand the ways that machine learning models predict labels using a wide variety of techniques throughout the AI application lifecycle.

Learn More


AI Fairness 360

Open source toolkit that can help users understand and mitigate bias in machine learning models throughout the AI application lifecycle

AI Fairness 360 is an extensible open source toolkit that can help users understand and mitigate bias in machine learning models throughout the AI application lifecycle.

Learn More


Adversarial Robustness Toolbox

Open source tools to evaluate, defend, certify and verify Machine Learning models and applications against adversarial threats

Adversarial Robustness Toolbox (ART) provides tools that enable developers and researchers to evaluate, defend, certify and verify Machine Learning models and applications against the adversarial threats.

Learn More


Amundsen

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data

Amundsen is a data discovery and metadata engine for improving the productivity of data analysts, data scientists and engineers when interacting with data.

Learn More

Angel
Angel ML

Open source high-performance distributed machine learning platform

Angel is a high-performance distributed machine learning platform. It is tuned for performance with big data from Tencent and has a wide range of applicability and stability, demonstrating increasing advantage in handling higher dimension model.

Learn More

DataPractices

An effort to increase awareness and data literacy in the data ecosystem

DataPractices is a “Manifesto for Data Practices,” comprised of values and principles to illustrate the most effective, modern, and ethical approach to data teamwork.

Learn More

Datashim

Open source enablement and acceleration of data access for Kubernetes/Openshift workloads in a transparent and declarative way

Datashim is enabling and accelerating data access for Kubernetes/Openshift workloads in a transparent and declarative way. Opensourced since September of 2019 and is growing to support use-cases related to data access in AI projects.
Learn More


Delta

DELTA is a deep learning based end-to-end natural language and speech processing platform

DELTA is a deep learning based end-to-end natural language and speech processing platform. DELTA aims to provide easy and fast experiences for using, deploying, and developing natural language processing and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3.

Learn More

Elastic Deep Learning
Elastic Deep Learning

Open source deep learning framework to build cluster cloud services

EDL optimizes the global utilization of the cluster running deep learning job and the waiting time of job submitters. It includes two parts: a Kubernetes controller for the elastic scheduling of distributed deep learning jobs, and a fault-tolerable deep learning framework.

Learn More


Egeria

The open standard that simplifies sharing and exchanging metadata.

Egeria is the world’s first open source metadata standard. It provides open APIs, event formats, types and integration logic so organizations can share data management and governance across the entire enterprise without reformatting or restricting the data to a single format, platform, or vendor product.

Learn More


FEAST

Open source feature store for machine learning

Feast is an open source feature store for machine learning. It was developed as a collaboration between Gojek and Google in 2018. Feast aims to:
– Provide scalable and performant access to feature data for ML models during training or serving.
– Provide a consistent view of features for both training and serving.
– Enable re-use of features through discovery, documentation, and metadata tracking.
– Ensures model performance by tracking, validating, and monitoring features in production.

Learn More

Flyte

Open source acceleration for machine learning and data workflows to production

Flyte is a production-grade, declarative, structured and highly scalable cloud-native workflow orchestration platform. It allows users to describe their ML/Data pipelines using Python, Java or (in the future other languages) and Flyte manages the data flow, parallelization, scaling and orchestration of these pipelines. Flyte builds on top of Docker containers and kubernetes.

Learn More


ForestFlow

An open source scalable policy-based cloud-native machine learning model server

ForestFlow is a scalable policy-based cloud-native machine learning model server. ForestFlow strives to strike a balance between the flexibility it offers data scientists and the adoption of standards while reducing friction between Data Science, Engineering and Operations teams.

Learn More


Horovod

Open source distributed training framework for TensorFlow, Keras and PyTorch

Horovod, a distributed training framework for TensorFlow, Keras and PyTorch, improves speed, scale and resource allocation in machine learning training activities. Uber uses Horovod for self-driving vehicles, fraud detection, and trip forecasting. It is also being used by Alibaba, Amazon and NVIDIA.

Learn More

JanusGraph

Distributed, open source, massively scalable graph database

 

JanusGraph is a scalable graph database optimized for storing and querying graphs containing hundreds of billions of vertices and edges distributed across a multi-machine cluster.

Learn More

Ludwig

Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code

Ludwig is a toolbox built on top of TensorFlow that allows to train and test deep learning models without the need to write code. All you need to provide is your data, a list of fields to use as inputs, and a list of fields to use as outputs, Ludwig will do the rest. Simple commands can be used to train models both locally and in a distributed way, and to use them to predict on new data.

Learn More


Machine Learning eXchange

Open source project, data and AI assets catalog and execution engine.

Machine Learning eXchange (MLX) is a Data and AI Assets Catalog and Execution Engine. It allows upload, registration, execution, and deployment of: AI pipelines and pipeline components, models, datasets, and notebooks.

Learn More


Marquez

Open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata

Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more.

Learn More

Milvus

Vector database that is highly flexible, reliable, and blazing fast.

Milvus is an open-source vector database that is highly flexible, reliable, and blazing fast. It supports adding, deleting, updating, and near real-time search of vectors on a trillion-byte scale.

Learn More

NNStreamer

Gstreamer plugins supporting ease and efficiency with neural network models and pipelines

NNStreamer is a set of Gstreamer plugins that support ease and efficiency for Gstreamer developers adopting neural network models and neural network developers managing neural network pipelines and their filters.

Learn More

ONNX

Open source format to represent deep learning models

With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them.

Learn More

OpenDS4All

Enables the creation of educational Data Science programs.

OpenDS4All is an open source project built to accelerate the creation of data science curricula at academic institutions.

Learn More

OpenLineage

Open standard for metadata and lineage collection designed to instrument jobs as they are running.

OpenLineage proposes an open standard and API for lineage collection that data processing engines can implement to publish at run time details of the data sources that it is reading, the types of processing it is performing and the destination of the results.

Learn More

Pyro

Open source universal probabilistic programming language

Pyro is a universal probabilistic programming language (PPL) written in Python and supported by PyTorch on the backend. Pyro enables flexible and expressive deep probabilistic modeling, unifying the best of modern deep learning and Bayesian modeling.

Learn More

RosaeNLG

Open source project, template-based Natural Language Generation (NLG) automating the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine

RosaeNLG is an open source project, template-based Natural Language Generation (NLG) automating the production of relatively repetitive texts based on structured input data and textual templates, run by a NLG engine. Production usage is widespread in large corporations, especially in the financial industry.

Learn More

SOAJS

Open source microservices and API management platform

SOAJS is an open source microservices and API management platform, SOAJS eliminates the IT plumbing challenges, so you can deploy microservices significantly earlier and faster. IT initiatives such as digital transformation are simplified, accelerated, cost reduced, and risk mitigated. Our fully integrated, world-class API lifecycle management, multi-cloud orchestration, release management, and IT Ops automation capabilities eliminate your IT organization’s modernization pain.

Learn More

Sparklyr

Sparklyr

Open source and modern interface to scale data science and machine learning workflows using Apache Spark™, R, and a rich extension ecosystem

sparklyr is an open-source and modern interface to scale data science and machine learning workflows using Apache Spark™, R, and a rich extension ecosystem. It enables using Apache Spark with ease using R by providing access to core functionality like installing, connecting and managing Spark and using Spark’s MLlib, Spark Structured Streaming and Spark Pipelines from R.

Learn More