Sylabs, the company offering products, services, and solutions based on Singularity, the open source container solution for compute based workloads, is the latest new member to join the LF Deep Learning Foundation at the Linux Foundation.
Singularity was developed originally for high-performance computing (HPC) and is rooted in the open source community.
“We’re developing a solution for containerization that targets those with compute-driven workloads, so the fit with LF Deep Learning is highly relevant,” said Gregory Kurtzer, CEO and founder of Sylabs. “There’s a massive need to properly containerize and support workflows related to artificial intelligence and machine/deep learning.”
“AI in containerization is evolving rapidly and we are pleased to welcome forward-thinking new member companies like Sylabs,” said Ibrahim Haddad, executive director of LF Deep Learning Foundation. “Sylabs brings experience in both containers and AI to the LFDL and we are looking forward to working together to benefit the open source community.”
Sylab’s integration between Singularity and Kubernetes leverages the Open Containers Initiative (OCI) image and runtime specifications as of the recent Singularity 3.1.0 release. Sylab’s recent blog shares a demonstration around this based upon a use case from Deep Learning.
Sylabs is excited to get more involved in the LF DL community and advance cloud native computing and AI innovation and efficiency around the compute-driven model. Sylabs will be attending KubeCon+CloudNativeCon North America later this year, while LF DL community members Huawei and Uber are taking part in KubeCon+CloudNativeCon+Open Source Summit China, June 24-26, 2019, in Shanghai, where the latest open source AI/DL/ML developments will be featured.
LF Deep Learning is building a sustainable ecosystem that makes it easy to create new AI products and services using open source technologies. Today, LF DL includes the following projects:
Acumos, a platform to build, share and deploy AI apps;
Angel ML, a flexible and Powerful Parameter Server for large-scale machine learning;
EDL, an Elastic Deep Learning framework designed to build cluster cloud services;
Horovod, a framework for distributed training across multiple machines; and
Pyro, a deep probabilistic programming framework that facilitates large-scale exploration of AI models.
Contributed by Uber, Pyro enables flexible and expressive deep probabilistic modeling
SAN FRANCISCO – February 21, 2019 – The LF Deep Learning Foundation (LF DL), a Linux Foundation project that supports and sustains open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), announces the Pyro project, started by Uber, as its newest incubation project. Built on top of the PyTorch framework, Pyro is a deep probabilistic programming framework that facilitates large-scale exploration of AI models, making deep learning model development and testing quicker and more seamless. This is the second project LF DL has voted in from Uber, following last December’s Horovod announcement.
Pyro is used by large companies like Siemens, IBM, and Uber, and startups like Noodle.AI, in addition to Harvard University, MIT, Stanford University, University of Oxford, University of Cambridge, and The Broad Institute. At Uber, Pyro solves a range of problems including sensor fusion, time series forecasting, ad campaign optimization and data augmentation for deep image understanding.
Pyro is the fifth project to join LF DL, which provides financial and intellectual resources, infrastructure, marketing, research, creative services and events support. This rich neutral environment spurs the rapid advancement of its projects, including Acumos AI, the Angel project, EDL project and Horovod, by encouraging additional contributors as well as broader collaboration across the open source community.
“The LF Deep Learning Foundation is excited to welcome Pyro to our family of projects. Today’s announcement of Uber’s contribution of the project brings us closer to our goal of building a comprehensive ecosystem of AI, machine learning and deep learning and projects,” said Ibrahim Haddad, Executive Director of the LF DL. “We look forward to helping to grow the community contributing to and using Pyro to further improve forecasting and other capabilities.”
Pyro was designed with four key principles in mind:
Universal: Pyro can represent any computable probability distribution.
Scalable: Pyro scales to large data sets with little overhead.
Minimal: Pyro is implemented with a small core of powerful, composable abstractions.
Flexible: Pyro aims for automation when you want it, control when you need it.
“Pyro was originally created at Uber AI Labs to help make deep probabilistic programming faster and more seamless for AI practitioners in both industry and academia,” said Zoubin Ghahramani, head of Uber AI Labs. “By incorporating Pyro into the LF DL portfolio, we hope to facilitate greater opportunities for researchers worldwide and make deep learning and Bayesian modeling more accessible.”
Pyro joins existing LF DL projects: Acumos AI, a platform and open source AI framework; Angel, a high-performance distributed machine learning platform based on Parameter Server; EDL, an Elastic Deep Learning framework designed to help cloud service providers to build cluster cloud services using deep learning frameworks; and Horovod, a distributed training framework for TensorFlow, Keras, and PyTorch.
Pyro Background Pyro provides a language for probabilistic modeling and inference, together with well-tested scalable implementations of inference algorithms including Stochastic Variational Inference and Hamiltonian Monte Carlo. The project was developed at Uber AI Labs as a platform for research in deep Bayesian models, including Bayesian Neural Nets and amortized Bayesian inference. The project currently has nearly 1,500 commits from 50 committers, and is licensed under the MIT license. More information on Pyro can be found on the Uber Engineering Blog. Uber also recently joined the Linux Foundation as a Gold member and contributed Jaeger, an open source distributed tracing system, to the Cloud Native Computing Foundation.
About LF Deep Learning The LF Deep Learning Foundation, a Linux Foundation project, accelerates and sustains the growth of artificial intelligence, machine learning and deep learning open source projects. The LFDL portfolio of projects focuses on Artificial Intelligence (AI), Machine Learning (ML) and Deep Learning (DL). Backed by many of the world’s largest technology leaders, LF Deep Learning is a neutral space for harmonization and ecosystem engagement to advance AI, DL and ML innovation. To get involved with the LF Deep Learning Foundation, please visit https://www.deeplearningfoundation.org.
About The Linux Foundation The Linux Foundation is the organization of choice for the world’s top developers and companies to build ecosystems that accelerate open technology development and industry adoption. Together with the worldwide open source community, it is solving the hardest technology problems by creating the largest shared technology investment in history. Founded in 2000, The Linux Foundation today provides tools, training and events to scale any open source project, which together deliver an economic impact not achievable by any one company. More information can be found at www.linuxfoundation.org.
# # #
The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page:https://www.linuxfoundation.org/trademark-usage. Linux is a registered trademark of Linus Torvalds.
SAN FRANCISCO – February 21, 2019 – The LF Deep Learning Foundation (LF DL), a Linux Foundation that supports and sustains open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), announces Ericsson has become the newest Premier Member. Ericsson, a global leader in delivering ICT solutions, has been at the forefront of communications technology for 140 years.
Ericsson has already begun contributing to the LF Deep Learning Foundation through the Acumos project, working with partners like AT&T, Orange and the broader community to solve complex problems surrounding 5G and IoT though AI and ML.
In addition to participating in LF DL, Ericsson is also a member of LF Networking, DPDK, the Cloud Native Computing Foundation and LF Edge Foundation. Ericsson is strongly committed to these future-forward technologies, and to that end the company has built a Global AI Accelerator focused on tackling the complex business problems of today and tomorrow.
Joining the LF Deep Learning Foundation as a Premier Member will provide Ericsson with a seat on the Board of Directors, as well as the Technical Advisory Council and Outreach Committee. Ericsson’s membership gives them the opportunity to contribute to and benefit from a neutral space where all players are working together to make rapid advancement on capabilities and tools that solve real problems for customers as our world becomes more and more connected.
“Today and in the future, as we accelerate 5G commercialization, AI and ML technologies are necessary to solve operational efficiencies and real-time decision making challenges in a trustworthy, reliable and secure way,” said Anita Frisell, VP, Head of Technology Development & Execution at Ericsson. “Collaboration is in our DNA at Ericsson and we see real value in being part of the LF Deep Learning Foundation and the open source community. We look forward to taking an active role in supporting the growth and evolution of the fast-growing open source AI and ML ecosystem, working with leaders across the industry to solve the complex challenges ahead.”
“We are very pleased to welcome Ericsson to the LF Deep Learning Foundation,” said Ibrahim Haddad, LF Deep Learning Foundation Executive Director. “We are creating a sustainable open source AI ecosystem that makes it easier to create AI products and services using open technologies and look forward to having Ericsson play a role in contributing to that effort.”
With the addition of Ericsson the LF DL board includes: Amdocs, AT&T, Baidu, Ericsson, Huawei, Nokia, Tech Mahindra, Tencent and ZTE. The LF Deep Learning Foundation is currently focused on a variety of projects including Acumos, a platform and open source framework that makes it easy to build, share and deploy AI apps; Angel ML, a flexible and Powerful Parameter Server for large-scale machine learning; EDL, an Elastic Deep Learning framework designed to help cloud service providers to build cluster cloud services using deep learning frameworks; Horovod, a framework for distributed training across multiple machines and Pyro, a deep probabilistic programming framework that facilitates large-scale exploration of AI models.
About Ericsson Ericsson enables communications service providers to capture the full value of connectivity. The company’s portfolio spans Networks, Digital Services, Managed Services, and Emerging Business and is designed to help our customers go digital, increase efficiency and find new revenue streams. Ericsson’s investments in innovation have delivered the benefits of telephony and mobile broadband to billions of people around the world. The Ericsson stock is listed on Nasdaq Stockholm and on Nasdaq New York. www.ericsson.com.
About LF Deep Learning The LF Deep Learning Foundation, a Linux Foundation project, accelerates and sustains the growth of artificial intelligence, machine learning and deep learning open source projects. Backed by many of the world’s largest technology leaders, LF Deep Learning is a neutral space for harmonization and ecosystem engagement to advance AI, DL and ML innovation. To get involved with the LF Deep Learning Foundation, please visit https://www.deeplearningfoundation.org. Our open source AI landscape is available at https://l.lfdl.io
About The Linux Foundation The Linux Foundation is the organization of choice for the world’s top developers and companies to build ecosystems that accelerate open technology development and commercial adoption. Together with the worldwide open source community, it is solving the hardest technology problems by creating the largest shared technology investment in history. Founded in 2000, The Linux Foundation today provides tools, training and events to scale any open source project, which together deliver an economic impact not achievable by any one company. More information can be found at www.linuxfoundation.org. ### The Linux Foundation has registered trademarks and uses trademarks. For a list of trademarks of The Linux Foundation, please see our trademark usage page: https://www.linuxfoundation.org/trademark-usage.
Linux is a registered trademark of Linus Torvalds.
Carsten Jacobsen, Open Source Developer Advocate @ Uber
Excerpt: Horovod adds support for more frameworks in the latest release and introduces new features to improve versatility and productivity.
Horovod, a distributed deep learning framework created by Uber, makes distributed deep learning fast and easy-to-use. Horovod improves the speed, scale, and resource allocation for training machine learning (ML) models with TensorFlow, Keras, PyTorch, and Apache MXNet. LF Deep Learning, a Linux Foundation project which supports and sustains open source innovation in artificial intelligence and machine learning, accepted Horovod as one of its hosted projects in December 2018. Since the project was accepted as a hosted, additional contributions and collaboration beyond Uber immediately occurred due to LF Deep Learning’s neutral environment, open governance and set of enablers that the foundation offered the project.
The updates in this latest release improve Horovod in three key ways: adding support and integration for more frameworks, improving existing features, and preparing the framework for changes coming with TensorFlow 2.0. Combined, these new functionalities and capabilities make Horovod easier, faster, and more versatile for its growing base of users, including NVIDIA and the Oak Ridge National Laboratory. Horovod has also been integrated with various deep learning ecosystems, including AWS, Google, Azure, and IBM Watson.
With this release, a number of new use cases for Horovod have been added with the purpose of making the framework a more versatile tool for training deep learning models. As the list of integrations and supported frameworks grows, users can leverage Horovod to accelerate a larger number of open source models, and use the same techniques across multiple frameworks.
PySpark and Petastorm support
Capable of handling a massive volume of data, Apache Spark is used across many machine learning environments. The ease-of-use, in-memory processing capabilities, near real-time analytics, and rich set of integration options, like Spark MLlib and Spark SQL, has made Spark a popular choice.
Given its scalability and ease-of-use, Horovod has received interest from broader, Python-based machine learning communities, including Apache Spark. With the release of PySpark support and integration, Horovod becomes useful to a wider set of users.
A typical workflow for PySpark before Horovod was to do data preparation in PySpark, save the results in the intermediate storage, run a different deep learning training job using a different cluster solution, export the trained model, and run evaluation in PySpark. Horovod’s integration with PySpark allows performing all these steps in the same environment.
In order to smooth out data transfer between PySpark and Horovod in Spark clusters, Horovod relies on Petastorm, an open source data access library for deep learning developed by Uber Advanced Technologies Group (ATG). Petastorm, open sourced in September 2018, enables single machine or distributed training and evaluation of deep learning models directly from multi-terabyte datasets.
A typical Petastorm use case entails preprocessing the data in PySpark, writing it out to storage in Apache Parquet, a highly efficient columnar storage format, and reading the data in TensorFlow or PyTorch using Petastorm.
Both Apache Spark and Petastorm are also used in some applications internally at Uber, so extending Horovod’s support to include PySpark and Petastorm has been a natural step in the process of making Horovod a more versatile tool.
Apache MXNet support
Apache MXNet (incubating) is an open source deep learning framework that facilitates more flexible and efficient neural network training. Amazon is a large contributor to both Horovod and MXNet, and natively supports both frameworks on Amazon EC2 P3 instances and Amazon SageMaker.
Like its recent support of PySpark, Horovod’s integration with MXNet is part of a larger effort to make Horovod available to a broader community, further expanding access to faster and easier model training.
The third update in this latest release is Horovod’s introduction of an alpha version of autotuning. In this release, autotuning is optional, but it will be turned on as default in future releases.
Horovod supports a number of internal parameters that can be adjusted to improve performance for variations in hardware and model architecture. Such parameters include the fusion buffer threshold for determining how many tensors can be batched together into a single allreduce, cycle time for controlling the frequency of allreduce batches, and hierarchical allreduce as an alternative to single-ring allreduce when the number of hosts becomes very large.
Finding the right values for these parameters can yield performance improvements as much as 30 percent. However, trying different parameters by hand is a time-consuming exercise in trial-and-error.
Horovod’s autotuning system removes the guesswork by dynamically exploring and selecting the best internal parameter values using Bayesian optimization.
Autotuning automates the otherwise manual process of trying different options and parameter values to identify the best configuration, which must be repeated if there are changes in hardware, scale, or models. Courtesy of automation, autotuning makes parameter optimization more efficient for faster model training.
Embedding is commonly used in machine learning use cases involving natural language processing (NLP) and learning from tabular data. At Uber’s datastore, Uber trips data is stored as tabular data which have some categorical bounds. In a use case like Uber’s, the number of embeddings and the size of embeddings will scale. With this latest release, Horovod has enhanced its capability of scaling deep learning models that make heavy use of embeddings, such as Transformer and BERT.
In addition, these embedding improvements facilitate large embedding gradients faster as well as the fusion of small embedding gradients, allowing for a larger number of embeddings to process operations faster.
Eager execution support in TensorFlow
Eager execution will be the default mode in TensorFlow 2.0. Eager execution allows developers to create models in an imperative programming environment, where operations are evaluated immediately, and the result is returned as real values. Eager execution eliminates the need to create sessions and work with graphs.
With eager execution’s support for dynamic models, model evaluation and debugging is made easier and faster. Eager execution also makes working with TensorFlow more intuitive for less experienced developers.
In the past, running Horovod with eager execution meant calculating each tensor gradient across all workers sequentially, without any tensor batching or parallelism. With the latest release, eager execution is fully supported. Tensor batching with eager execution improved performance by over 6x in our experiments. Additionally, users can now make use of a distributed implementation of TensorFlow’s GradientTape to record operations for automatic differentiation.
Mixed precision training
Mixed precision is the combined use of different numerical precisions in a computational method. Using precision lower than FP32 reduces memory requirements by using smaller tensors, allowing deployment of larger networks. In addition, data transfers take less time, and compute performance increases dramatically. GPUs with Tensor Cores support mixed precision and enable users to capitalize on the benefits of lower memory usage and faster data transfers.
Mixed precision training of deep neural networks achieves two main objectives:
Decreases the required amount of memory, enabling training of larger models or training with larger mini-batches
Shortens the training or inference time by reducing the required resources by using lower-precision arithmetic.
In the past, mixed precision training used to break Horovod’s fusion logic, since the sequence of FP16 tensors would be frequently broken by FP32 tensors, and tensors of different precisions could not participate in a single fusion transaction.
With the latest release, NVIDIA contributed an improvement to tensor fusion logic that allows FP16 and FP32 tensor sequences to be processed independently via a look-ahead mechanism. We have seen up to 26 percent performance improvement with this change.