Angel 3.0 Available Now – Major Milestone in Providing Full ML Stack

By Blog

Angel 3.0 is now available via Angel offers a full stack machine learning platform designed for sparse data and huge model scenarios, built on a high-performance Parameter Server (PS). Angel is used by Tencent and more the 100 companies in products or internally to their organizations. It boasts 4200+ stars on GitHub, 7 sub-projects, 1100+ forks, and 2000+ commits.

Angel joined the LF AI Foundation in August 2018 as an incubation project from Tencent, a Premier member of the Foundation. 

Angel 3.0 Features

Angel 3.0 adds Auto Feature Engineering, New or Enhanced Computation Engines including Angel native, Spark on Angel (SONA) and PyTorch on Angel (PyTONA). Angel 3.0 therefore allows users to switch to Angel from Spark or PyTorch smoothly with nearly zero cost. 

A detailed white paper on Angel 3.0 authored by Fitz Wang, Ph.D., Senior Researcher, Tencent, and Angel’s maintainer and core developer, introduces the new features of Angel 3.0. It shows what distinguishes Angel from existing machine learning platforms such as TensorFlow, PyTorch, MxNet, PaddlePaddle and Spark. It is available here:

LF AI Foundation Projects 

LF AI is an LF umbrella foundation that was founded in March 2018 to support and sustain collaboration and open source innovation in AI, machine learning and deep learning. It offers a neutral environment to its hosted open source projects and support them with a number of services to help the projects gain wider adoption. Current projects include Acumos AI, Angel, Elastic Deep Learning (EDL), Horovod, and Pyro. For more information please on these projects, please visit:

The LF AI Foundation supports open source AI developers and organizations around the world. We are constantly looking to host and support additional projects. People interested in hosting  their projects under the LF AI Foundation are encouraged to email us at Details on proposing projects for hosting in LF AI are available via

Meet Angel’s Developers at OSS NA

Angel core maintainers and other developers are presenting on August 20 at the LF AI Meetings in San Diego, co-located at the Open Source Summit NA, and will also be at the LF AI booth Aug 21-23 to show demos and answer questions.

For more information, including both schedules and more, please see:

LF AI Meetings, San Diego – How to Register 

LF AI Meetings, San Diego – Agenda

LF AI Booth #43 – Developer Schedule at Open Source Summit

Integration Among Tools – Key to Machine Learning Implementation Success

By Blog

Guest Author(s): Dr. Ofer Hermoni, Director of Product Strategy in Amdocs’ CTO office,

Use and adoption of Machine Learning (ML) and Deep Learning (DL) technologies is exploding with the availability of dozens of such open source libraries, frameworks and platforms, not to mention all the proprietary solutions. While there are many applications and tools out there, the integration between them can be complicated, can pose additional challenges especially in relation to long term sustainability, and may present a barrier for and adoption as part of a commercial product/service. 

To help developers and data scientists make sense of the diversity of projects, the LF AI landscape (Figure 1) was originally published in December 2018 and has been continuously updated ever since.  The LF AI landscape is an interactive tool that shows both how fragmented the space as well as the wide range of projects in each technology category.

Figure 1: LF AI Landscape available via

Most open source AI projects started as proprietary efforts and are the result of years of investment and talent acquisition. At different points in time, the founding company (or companies) of such an effort decide to open source projects as a consequence of wanting to build an ecosystem around it and to collaborate with others on constructing a platform. The end result of that phenomenon is a large ecosystem of open source projects.

The important question from an adoption perspective is which open source project to adopt and how to integrate it with other open source solutions (libraries, frameworks, etc.) and internal proprietary stacks.  

Goal: Better Integration among Projects and Tools

One of the goals of LF AI Foundation is building integration among LF AI projects and generally available and open source solutions so users can easily take advantage of a wide array of options and further the adoption of open source for AI solutions. This effort to improve integration and collaboration is aimed at helping bring everyone up to the same level of understanding of common deployments of ML workflow. Few companies are willing or able to provide this. This filtering and analysis is uniquely suited to a foundation like the LF AI Foundation, since we can look across specialties and provide help and guidance.

In his talk “How Linux Foundation is Changing the (Machine-Learning) World,” Ofer Hermoni, Ph.D., Director of Product Strategy, CTO Office, Amdocs, and the Chairperson, of LF AI Technical Advisory Council, highlights one of the key goals of the LF AI: 

“Harmonization, Interoperability – Increase efforts to harmonize open source projects and reduce fragmentation; increase the interoperability among projects”

This has led to the LF AI Technical Advisory Committee (TAC) pushing to clarify the current landscape. First, what is a typical workflow? What projects are already available under the LF AI umbrella that can implement parts of that workflow? Finally, what open source projects are out there that help fill the gaps and provide good alternatives? This way, users can quickly start to understand the larger picture (landscape) and have a great understanding of not just available open source components in the AI/ML/DL space but also how to integrate them together in implementing an end-to-end ML workflow. At the same time, LF AI can better evaluate where integration is already strong and where there are gaps that can be opportunities to collaborate and fill following the open source approach for the benefit of the broader open source AI community.

The reference ML workflow produced by TAC is summed up in three main layers. 

We started with reviewing existing published flows. We then built on them and extended them to create an entire workflow that covers the lifetime of ML integration across three major phases, starting with data preparation including data governance, moving through model creation, including ethics management, and then moving toward solution rollout including security management.

Figure 2: ML Workflow as defined by the LF AI TAC

Second, the identification of existing LF AI hosted projects and where they fit in the ML workflow. 

Figure 3: ML Workflow showcasing the fit of the LF AI hosted projects (Acumos, EDL, Angel, Horovod and Pyro)  

And, third, the ML workflow highlighting other open source projects and where they fit in, such as TensorFlow, Keras, PyTorch, Kubeflow and many more.

Figure 4: Same ML Workflow highlighting the fit of other existing open source projects

The figures are a great way to quickly grasp the entire process and identify the scope of the applications and tools that are needed, and is especially helpful in identifying integration opportunities across these different projects. The result is a better understanding of the connections or lack of connections, and a path to create these connections or integration points. 

Who should use this?

We would like to hear from as many developers and data scientists as possible, since we are just getting started.  There are certainly more connections and gaps to be identified. Integration work takes time. It’s been built up over the past year. This activity is open not only to LF AI members, but to the entire community, and many companies already participate in the discussions.

How Does My Project Get Involved?

The ML Workflow effort is open for participation and we are soliciting feedback to improve our reference workflow. There are various ways in which you can participate and get involved:

Meet the LF AI Team in San Diego (August 20, 2019)

LF AI is hosting an open meeting in San Diego on August 20th with the goal to discuss the ongoing projects, explore new collaboration opportunities, and provide face-to-face feedback and updates on various Foundation ongoing technical efforts. We welcome you to join, get to meet our members, projects, and staff, and explore ways to get involved in our efforts. 

For more information please visit:

About the author

As the Director of Product Strategy in Amdocs’ CTO office, Dr. Ofer Hermoni is responsible for leading all of Amdocs’ activities in the Machine-Learning open-source community, including defining Amdocs’ product strategy in the area of AI/machine learning. In addition, he is the Chairperson of the LF AI Foundation Technical Advisory Council and a member of the LF AI Foundation Governing Board. Ofer is also an active contributor to the Acumos AI project, and a member of the Acumos AI Technical Steering Committee.

Let’s Talk Open Source AI! Open Invitation to the LF AI Meetings on August 20th in San Diego

By Blog

Come join us! The LF AI Meetings are being held in San Diego, Aug 20, 9am-12:30pm one day prior to Open Source Summit North America, San Diego (Aug 21-23). LF AI members meet and discuss ongoing projects, explore new collaboration opportunities, and provide face-to-face feedback and updates.

It’s a great opportunity to meet with AI developers working on LF AI hosted projects and LF AI staff, too!

Meet the developers! – Who’s at the LF AI Booth #43?

Time SlotWednesday – Aug 21
10:30 am – 12:30 pm Angel, Acumos
2:00 pm– 4:00 pmHorovod, Acumos
4:00 pm – 6:00 pmHorovod, Acumos
6:00 pm – 7:00 pmAcumos

*Booth Crawl & ELC Tech Showcase – 5:30 – 7:00 pm

Time SlotThursday – Aug 22
10:30 am – 12:30 pmAcumos
2:00 pm – 4:00 pmHorovod, Acumos
4:00 pm – 5:30 pmHorovod, Acumos
Time SlotFriday – Aug 23
10:30 am – 12:30 pmAngel, Acumos
2:00 pm – 4:00 pmAngel, Acumos

For registration information and location details, please see:

Looking forward to seeing you there!

Registration for LF AI Day – Paris Now Open! Expanding Open Source AI Engagement Across the Globe

By Blog

Register Now! 

Orange and the LF AI Foundation  are excited to announce the LF AI Day – Paris coming up on September 16 in Paris-Châtillon. LF AI Days are regional, one-day events hosted and organized by local members with support from LF AI and its hosted projects. These events are open to all for participation and have no cost to attend.

Hosted at the beautiful Orange Gardens, 44 Avenue de la République, LF AI Day – Paris will feature keynote speakers from leading operators and AI industry, including Orange, NTT, Nokia, Deutsche Telekom, LF AI Foundation, and more. The agenda will focus on open source strategies and ongoing technical developments in the areas of open source machine learning and deep learning. During this event, various AI topics will be covered, including technical presentations, demonstrations of Orange AI Marketplace based on Acumos, an LF AI Graduate project, and a Startups panel discussion.

Agenda (updated Sept 12)

The agenda for the full-day free event is as follows. 

Check-in and registration
Welcome Message,
Nicolas Demassieux SVP, Orange Labs Research
Building Sustainable Open Source AI Ecosystem,
Ibrahim Haddad, Executive Director, LF AI Foundation
Orange AI activities,
Steve Jarrett VP, Orange Data & AI
NTT’s Challenges of AI for Innovative Network Operation
Masakatsu Fujiwara, Project Manager,
NTT Network Technology Laboratories
Coffee Break
Acumos AI – Platform Overview, Releases and Use Cases
Anwar Aftab, Director, Inventive Science, AT&T Labs
We make AI accessible
Jamil Chawki, Intrapreneur-CEO Orange AI Marketplace
and Chair of the LF AI Outreach Committee
Trusted AI – Reproducible, Unbiased and Robust AI Pipelines using Open Source
Romeo Kienzler, Chief Data Scientist
IBM Center for Open Source Data and AI Technologies
Activities in LF AI and Acumos,
Sahar Tahvili, PhD. Lead Data Scientist, Ericsson,
Global Artificial Intelligence Accelerator (GAIA), Sweden
Acumos & Orange AI Marketplace demonstration
Philippe Dooze, Project Technical Lead,
Orange Labs Networks
Nokia, AI and Open Source
Philippe Carré, Senior Specialist Open Source, Nokia Bell-Labs & CTO
Startups Panel Discussion, Barriers for AI development,
François Tillerot, Intrapreneur-CMO,
Orange AI Marketplace

Rahul Chakkara, Co-Founder, Manas AI
Laurent Depersin, Research & Innovation Home Lab Director, Interdigital
Marion Carré, CEO Ask Mona
Sana Ben Jemaa, Project Manager Radio & AI, Orange Labs Networks
Open Discussion and Closing Session

There will be a welcome reception after the event. Details will be posted on the event’s page

For questions, please contact

To view LF AI Days happening is other geographical regions, please visit the LF AI Events page. 

Register Now! 

AT&T, Orange, Tech Mahindra Adoption of Acumos AI Builds Foundation for Growth

By Blog

by John Murray, Assistant Vice President of Inventive Science, Intelligent Systems and Incubation, AT&T

With the release of Acumos AI in late 2018, the core idea was to create a sustainable open source AI ecosystem by making it easy to create AI products and services using open source technologies in a neutral environment. Acumos AI was aimed squarely at reducing the need for specialists and lowering the barriers to AI.

Fundamentally, lowering barriers to AI means making it easier to create and train models.

The new Boreas release, just announced in June, does exactly that. Users now have readily available tools to create and train models, enabling the full lifecycle of development from model onboarding and designing, to sharing and deploying. Jupyter Notebooks and NiFi, two popular and well-known document and graphics tools, are now integrated in the pipeline. Access by users through an enhanced UX in the portal provide publishing, unpublishing, deploying and onboarding, model building, chaining, and more.

At the same time, AI model suppliers will be able to provide a software license with their models to ensure that the user has acquired the right to use the model. This is key for marketplace-like transactions. Boreas explicitly supports licenses and Right-To-Use for proprietary models. It also now supports license scans of models and metadata.

The new features in Boreas move AI development forward significantly, allowing developers and data scientists who are not AI specialists to develop and deploy apps.

Leadership and Real World Implementations

The LF AI Foundation charter promises to connect members and contributors with the innovative technical projects, companies, and developer communities that are transforming AI and Machine Learning. But the question is always, is it being used? And how does it perform in the real world?

AT&T, Orange and Tech Mahindra are three great examples how Acumos AI has jumped forward quickly in the last 6 months. All three companies are founding members of the LF AI Foundation and have been providing leadership in both development resources and real world implementations of the Acumos AI framework and marketplace. The reach of their current deployments is distinctly international and extremely ambitious.

AT&T – Infusing AI Across Operation – Big and Small

Two years ago, AT&T saw an opportunity to make AI more accessible and reduce barriers to this exciting industry. Together with Tech Mahindra and The Linux Foundation, AT&T developed Acumos AI to serve as an open marketplace for innovators to create and exchange the best AI solutions possible. Two years and two releases later, we’ve seen firsthand the success of this open approach. It’s led to the creation of new solutions from students, developers, startups and several groups across AT&T’s varied business.

At AT&T, we’re not only helping to improve the Acumos AI code for the public, we’re also using it to improve efficiencies in our own organization. In the past year, AT&T has leveraged Acumos models across customer care, network security, and a variety of different aspects of the business. And, with each release comes additional enhancements, capabilities and opportunities to infuse AI across operations – big and small.

Orange – We make AI accessible and ready for 5G

Orange is using Acumos for its new AI Marketplace. Orange is a leading telecommunications company with 273 million customers worldwide and revenue of €41B (2017). The Orange AI Marketplace is an AI app store where developers can publish and share AI services that can be quickly and easily deployed by customers.

Orange has increased involvement in Acumos significantly. Orange’s contributions to Acumos AI include the onboarding enhancements seen in the new Acumos Boreas release. After testing the publication and export of AI model for operations use cases – such as incident detection and tickets classification – Acumos was deployed as the basis for the Orange AI Marketplace.

The second half of 2019 will see even more implementations and further growth by Acumos AI. Please come back to find out more information here on the LF AI Foundation blog covering innovative use cases and key implementations of Acumos AI worldwide.

Acumos was also proposed by Orange as an AI platform for the European research project AI4EU.  The goals of AI4EU are ambitious, including making the promosis of AI “real” for the EU, and creating a collaborative AI European platform to nurture economic growth. Involving 80 partners, covering 21 countries, the project kicked off in January 2019 and will run for three years and it is expected to implement Acumos by the end of 2019.

Tech Mahindra GAiA – Democratizing AI

Tech Mahindra GAiA is the first enterprise-grade open source AI platform. It hosts a marketplace of AI models which can be applied to use cases in multiple industry verticals. These are used as the basis for building, sharing and rapidly deploying AI-driven services and applications to solve business critical problems.

GAiA is available for commercial products and services and supports open source distribution at the same time. Tech Mahindra is aiming to fully democratize AI. The core concept behind GAiA is that the knowledge and expertise around AI should be universally accessible.

The launch of the GAiA platform is in line with Tech Mahindra’s TechMNxt charter which focuses on leveraging next generation technologies like AI to address real world problems and meet the customer’s evolving and dynamic needs.

Getting Involved in the LF AI Foundation and Acumos

What to get involved? It’s easy to get started! You can get involved with specific projects with development, review, events, documentation, and much more. You can participate in the Technical Advisory Committee (TAC) by joining the discussions on bi-weekly calls, identifying collaboration opportunities, inviting speakers to outside events, evaluating new projects, and more more, And you can take advantage of marketing and outreach provided by the LF AI Foundation. 

The full “Getting Involved Guide” is available for current and prospective members.

“We’ve written this guide to provide you a complete reference to the LF AI community. You will learn how to engage with your communities of interest, all the different ways you can contribute, and how to get help when you need it. If you have suggestions for enhancing this guide, please get in touch with LF AI staff.”

If you are interested in joining the LF AI Foundation:

John Murray Bio

John Murray is the Assistant Vice President of Inventive Science, Intelligent Systems and Incubation at AT&T. He leads the Intelligent Systems and Incubation organization which uses software, platforms, data, analytics and AI and machine learning to deliver solutions that address AT&T’s needs. He is an expert in design and building advanced communications systems and is involved in key initiatives such as ONAP, Acumos, data management, and automation and communications systems.

Are you in Government or the Public Sector? The Call for Participation for the AAAI AI in Government and Public Sector Fall Symposium is Open!

By Blog

Government is at the front lines of the democratization of AI. The scale of participation and the importance in citizens lives means that government and public section approaches to open source AI will be a central component of how development changes and evolves in the coming years.

The Association for the Advancement of Artificial Intelligence (AAAI) is holding its 2019 Fall Symposium Series in Washington, DC, Nov 7–9, 2019.

This symposium will focus on a wide array of government and public sector AI topics. From the Call for Papers (see attached PDF for more information)

“There are hundreds of open source AI related projects focusing on several AI sub-domains such as deep learning, machine learning, models, natural language processing, speech recognition, data, reinforcement learning, notebook environments, ethics and many more.  How can government entities leverage the abundance of open source AI projects and solutions in building their own platforms and services? Based on which criteria should we evaluate various projects aiming to solve same or similar problems? What kind of framework should be in place to validate these projects, and allowing the trust in AI code that will be deployed for public service?”

Submit your proposal by July 26 through the AAAI site choosing the AAAI/FSS-19 Artificial Intelligence in Government and Public Sector track:

Contact Frank Stein ( with any questions.

LF Deep Learning Becomes LF AI Foundation to Encompass Growing Portfolio of Technologies

By Blog

Today we’re announcing a name change to our Foundation, but it’s really about so much more than a name. It’s about reflecting the growing scope of our organization and the increasing number of technologies being built in our community. That’s why the new name is LF AI Foundation, which encompasses AI (artificial intelligence), machine learning, deep learning and more.

We are on the precipice of a major technological shift with AI, which is exactly the point in any technology evolution where open source software and community comes into play. The interest and contribution to our work is accelerating and the name change reflects that.

Over the past year, we’ve encountered new projects being hosted with us, rapid code releases within those projects and additional members supporting, adopting and contributing to this work. Our portfolio of projects, in particular, is expanding in ways that are supporting developer communities across AI, all under our stewardship. From Acumos to Angel, Elastic Deep Learning, Horovod and Pyro, we are building an upstream technical open source community that crosses artificial intelligence, machine learning, deep learning and other AI sub-domains. It’s a natural time to more accurately reflect the intensive and comprehensive collaboration at work within our community.

The  LF AI Foundation will  formally expand its scope to support a growing ecosystem of AI, machine learning and deep learning technologies. In just the last six months, the overall ecosystem captured in our landscape has grown from 80 to more than 170 projects with a combined 350 million lines of code from more than 80 different organizations around the world. This level and pace of collaborative open source development is similar to the earliest days of Linux, blockchain, cloud and containers. The time to put the proper infrastructure and scope in place is at hand.

Join Us at the Open Source Summit NA

We’re hosting our LF AI members and community for meetings and discussion sessions on August 20th, one day before the Open Source Summit NA. Please join us in exploring and discussing LF AI and our projects. You can register to attend as part of your OSS NA registration.

Additional LF AI Resources:

About LF AI Foundation

The LF AI Foundation, a Linux Foundation project, accelerates and sustains the growth of Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL) open source projects. Backed by many of the world’s largest technology leaders, LF AI is a neutral space for harmonization and ecosystem engagement to advance AI, MLand DL innovation. To get involved with the LF AI Foundation, please visit  

The LF Deep Learning Foundation Welcomes New Member Sylabs

By Blog

Sylabs, the company offering products, services, and solutions based on Singularity, the open source container solution for compute based workloads, is the latest new member to join the LF Deep Learning Foundation at the Linux Foundation.

Singularity was developed originally for high-performance computing (HPC) and is rooted in the open source community.

“We’re developing a solution for containerization that targets those with compute-driven workloads, so the fit with LF Deep Learning is highly relevant,” said Gregory Kurtzer, CEO and founder of Sylabs. “There’s a massive need to properly containerize and support workflows related to artificial intelligence and machine/deep learning.”

With a focus on compute-centric containerization, Sylabs also joined the Linux Foundation initiative behind Kubernetes, the Cloud Native Computing Foundation.

“AI in containerization is evolving rapidly and we are pleased to welcome forward-thinking new member companies like Sylabs,” said Ibrahim Haddad, executive director of LF Deep Learning Foundation. “Sylabs brings experience in both containers and AI to the LFDL and we are looking forward to working together to benefit the open source community.”    

Sylab’s integration between Singularity and Kubernetes leverages the Open Containers Initiative (OCI) image and runtime specifications as of the recent Singularity 3.1.0 release. Sylab’s recent blog shares a demonstration around this based upon a use case from Deep Learning.

Sylabs is excited to get more involved in the LF DL community and advance cloud native computing and AI innovation and efficiency around the compute-driven model. Sylabs will be attending KubeCon+CloudNativeCon North America later this year, while LF DL community members Huawei and Uber are taking part in KubeCon+CloudNativeCon+Open Source Summit China, June 24-26, 2019, in Shanghai, where the latest open source AI/DL/ML developments will be featured.

LF Deep Learning is building a sustainable ecosystem that makes it easy to create new AI products and services using open source technologies. Today, LF DL includes the following projects:

    • Acumos, a platform to build, share and deploy AI apps;
    • Angel ML, a flexible and Powerful Parameter Server for large-scale machine learning;
    • EDL, an Elastic Deep Learning framework designed to build cluster cloud services;
    • Horovod, a framework for distributed training across multiple machines; and
    • Pyro, a deep probabilistic programming framework that facilitates large-scale exploration of AI models.

For more on LF DL news and progress, join our mailing list and follow us on Twitter.

Horovod Adds Support for PySpark and Apache MXNet and Additional Features for Faster Training

By Blog

Carsten Jacobsen, Open Source Developer Advocate @ Uber 

Excerpt: Horovod adds support for more frameworks in the latest release and introduces new features to improve versatility and productivity.

Horovod, a distributed deep learning framework created by Uber, makes distributed deep learning fast and easy-to-use. Horovod improves the speed, scale, and resource allocation for training machine learning (ML) models with TensorFlow, Keras, PyTorch, and Apache MXNet. LF Deep Learning, a Linux Foundation project which supports and sustains open source innovation in artificial intelligence and machine learning, accepted Horovod as one of its hosted projects in December 2018. Since the project was accepted as a hosted, additional contributions and collaboration beyond Uber immediately occurred due to LF Deep Learning’s neutral environment, open governance and set of enablers that the foundation offered the project.

The updates in this latest release improve Horovod in three key ways: adding support and integration for more frameworks, improving existing features, and preparing the framework for changes coming with TensorFlow 2.0. Combined, these new functionalities and capabilities make Horovod easier, faster, and more versatile for its growing base of users, including NVIDIA and the Oak Ridge National Laboratory. Horovod has also been integrated with various deep learning ecosystems, including AWS, Google, Azure, and IBM Watson.

With this release, a number of new use cases for Horovod have been added with the purpose of making the framework a more versatile tool for training deep learning models. As the list of integrations and supported frameworks grows, users can leverage Horovod to accelerate a larger number of open source models, and use the same techniques across multiple frameworks.

PySpark and Petastorm support

Capable of handling a massive volume of data, Apache Spark is used across many machine learning environments. The ease-of-use, in-memory processing capabilities, near real-time analytics, and rich set of integration options, like Spark MLlib and Spark SQL, has made Spark a popular choice.

Given its scalability and ease-of-use, Horovod has received interest from broader, Python-based machine learning communities, including Apache Spark. With the release of PySpark support and integration, Horovod becomes useful to a wider set of users.

A typical workflow for PySpark before Horovod was to do data preparation in PySpark, save the results in the intermediate storage, run a different deep learning training job using a different cluster solution, export the trained model, and run evaluation in PySpark. Horovod’s integration with PySpark allows performing all these steps in the same environment.

In order to smooth out data transfer between PySpark and Horovod in Spark clusters, Horovod relies on Petastorm, an open source data access library for deep learning developed by Uber Advanced Technologies Group (ATG). Petastorm, open sourced in September 2018, enables single machine or distributed training and evaluation of deep learning models directly from multi-terabyte datasets.

A typical Petastorm use case entails preprocessing the data in PySpark, writing it out to storage in Apache Parquet, a highly efficient columnar storage format, and reading the data in TensorFlow or PyTorch using Petastorm.

Both Apache Spark and Petastorm are also used in some applications internally at Uber, so extending Horovod’s support to include PySpark and Petastorm has been a natural step in the process of making Horovod a more versatile tool.

Apache MXNet support

Apache MXNet (incubating) is an open source deep learning framework that facilitates more flexible and efficient neural network training. Amazon is a large contributor to both Horovod and MXNet, and natively supports both frameworks on Amazon EC2 P3 instances and Amazon SageMaker.

Like its recent support of PySpark, Horovod’s integration with MXNet is part of a larger effort to make Horovod available to a broader community, further expanding access to faster and easier model training.


The third update in this latest release is Horovod’s introduction of an alpha version of autotuning. In this release, autotuning is optional, but it will be turned on as default in future releases.

Horovod supports a number of internal parameters that can be adjusted to improve performance for variations in hardware and model architecture. Such parameters include the fusion buffer threshold for determining how many tensors can be batched together into a single allreduce, cycle time for controlling the frequency of allreduce batches, and hierarchical allreduce as an alternative to single-ring allreduce when the number of hosts becomes very large.

Finding the right values for these parameters can yield performance improvements as much as 30 percent. However, trying different parameters by hand is a time-consuming exercise in trial-and-error.

Horovod’s autotuning system removes the guesswork by dynamically exploring and selecting the best internal parameter values using Bayesian optimization.

Autotuning automates the otherwise manual process of trying different options and parameter values to identify the best configuration, which must be repeated if there are changes in hardware, scale, or models. Courtesy of automation, autotuning makes parameter optimization more efficient for faster model training.

Embedding improvements

Embedding is commonly used in machine learning use cases involving natural language processing (NLP) and learning from tabular data. At Uber’s datastore, Uber trips data is stored as tabular data which have some categorical bounds. In a use case like Uber’s, the number of embeddings and the size of embeddings will scale. With this latest release, Horovod has enhanced its capability of scaling deep learning models that make heavy use of embeddings, such as Transformer and BERT.

In addition, these embedding improvements facilitate large embedding gradients faster as well as the fusion of small embedding gradients, allowing for a larger number of embeddings to process operations faster.

Eager execution support in TensorFlow

Eager execution will be the default mode in TensorFlow 2.0. Eager execution allows developers to create models in an imperative programming environment, where operations are evaluated immediately, and the result is returned as real values. Eager execution eliminates the need to create sessions and work with graphs.

With eager execution’s support for dynamic models, model evaluation and debugging is made easier and faster. Eager execution also makes working with TensorFlow more intuitive for less experienced developers.

In the past, running Horovod with eager execution meant calculating each tensor gradient across all workers sequentially, without any tensor batching or parallelism. With the latest release, eager execution is fully supported. Tensor batching with eager execution improved performance by over 6x in our experiments. Additionally, users can now make use of a distributed implementation of TensorFlow’s GradientTape to record operations for automatic differentiation.

Mixed precision training

Mixed precision is the combined use of different numerical precisions in a computational method. Using precision lower than FP32 reduces memory requirements by using smaller tensors, allowing deployment of larger networks. In addition, data transfers take less time, and compute performance increases dramatically. GPUs with Tensor Cores support mixed precision and enable users to capitalize on the benefits of lower memory usage and faster data transfers.

Mixed precision training of deep neural networks achieves two main objectives:

  1. Decreases the required amount of memory, enabling training of larger models or training with larger mini-batches
  2. Shortens the training or inference time by reducing the required resources by using lower-precision arithmetic.

In the past, mixed precision training used to break Horovod’s fusion logic, since the sequence of FP16 tensors would be frequently broken by FP32 tensors, and tensors of different precisions could not participate in a single fusion transaction.

With the latest release, NVIDIA contributed an improvement to tensor fusion logic that allows FP16 and FP32 tensor sequences to be processed independently via a look-ahead mechanism.  We have seen up to 26 percent performance improvement with this change.

Curious about how Horovod can make your model training faster and more scalable? Check out these new updates and try out the framework for yourself, and be sure to join the Deep Learning Foundation’s Horovod announcement and technical discussion mailing lists.

Introducing the Interactive Deep Learning Landscape

By Blog

The artificial intelligence (AI), deep learning (DL) and machine learning (ML) space is changing rapidly, with new projects and companies launching, existing ones growing, expanding and consolidating. More companies are also releasing their internal AI, ML, DL efforts under open source licenses to leverage the power of collaborative development, benefit from the innovation multiplier effect of open source, and provide faster, more agile development and accelerated time to market.

To make sense of it all and keep up to date on an ongoing basis, the LF Deep Learning Foundation has created an interactive Deep Learning Landscape, based on the Cloud Native Landscape pioneered by CNCF. This landscape is intended as a map to explore open source AI, ML, DL projects. It also showcases the member companies of the LF Deep Learning Foundation who contribute contribute heavily to open source AI, ML and DL and bring in their own projects to be housed at the Foundation.

This tool allows viewers to filter, obtain detailed information on a specific project or technology, and easily share via stateful URLs. It is intended to help developers, end users and others navigate the complex AI, DL and ML landscape.

All data is also available in a GitHub repo, and anyone may update or add to the landscape by submitting a pull request on GitHub.

We encourage you to spend some time with this tool, learn more about the current AI, DL and ML space, and begin contributing to it.