The LF AI & Data Foundation is pleased to share that the Open Source, Accelerating AI Innovation webinar of ITU AI for Good Summit, will be held online on October 21, 2021.
The “Open Source Accelerating AI Innovation” session will bring together several experts from the AI open source community to bring to light the state-of-the art breakthroughs in their open source exploration. It’s known that the development of open source technologies is the key driver of sustainable AI innovation and this webinar is a great opportunity for you to hear fantastic thoughts and opinions about why open source is so important to AI and how open source can power AI.
Registration is now open and the event is free to attend. Please see the agenda below and register right now.
The LF AI & Data Foundation would like to take this opportunity to thank all of the attendees, speakers/presenters, and booth staff volunteers for a great experience at OSS+ELC 2021. Of course none of this would be possible without the amazing Events Team at The Linux Foundation and all of the Program Committee members and their hard work. With the ongoing COVID-19 situation, we were thrilled to be able to participate in this hybrid onsite/virtual event.
The AI & Data Track during the conference was successful, with a great showing of attendees and great subject matter experts as speakers. If you didn’t get a chance to visit our Bronze booth in the onsite or virtual exhibit hall, please take a moment to explore our resources below.
Be sure to look for us at our next events, KubeCon+CloudNativeCon+OSS China (December 9th & 10th) and OSS Japan (December 14th & 15th) at our virtual booth. Thank you and we look forward to connecting again soon!
The LF AI & Data Foundation is pleased to announce the upcoming LF AI & Data Day* – ONNX Community Virtual Meetup – October 2021, to be held via Zoom on October 21st.
ONNX, an LF AI & Data Foundation Graduated Project, is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them.
The virtual meetup will cover ONNX Community updates, partner/end-user stories, and SIG/WG updates. Check out the schedule of events here. If you are using ONNX in your services and applications, building software or hardware that supports ONNX, or contributing to ONNX, you should attend! This is a great opportunity to connect with and hear from people working with ONNX across many companies.
Registration is now open and the event is free to attend. Capacity will be 500 attendees. For up to date information on this virtual meetup, please visit the event website.
Want to get involved with ONNX? Be sure to join the ONNX-Announce mailing list to join the community and stay connected on the latest updates. You can join technical discussions on GitHub and more conversations with the community on LF AI & Data Slack’s ONNX channels.
Note: In order to ensure the safety of our event participants and staff due to the Novel Coronavirus situation (COVID-19) the ONNX Steering Committee decided to make this a virtual-only event via Zoom.
*LF AI & Data Day is a regional, one-day event hosted and organized by local members with support from LF AI & Data and its Projects. Learn more about the LF AI & Data Foundation here.
LF AI & Data Foundation is proud to support the release of NNStreamer 2.0, one of our incubation-stage projects. NNStreamer is a set of Gstreamer plugins that support ease and efficiency for Gstreamer developers adopting neural network models and neural network developers managing neural network pipelines and their filters. Features of the new release include Edge-AI capabilities and new stream types. With this new release, NNStreamer 2.0 achieved the CII Best Practices passing badge last month. This recognizes that the community maintains high-quality code, documentation, testing, and of course, a high level of security as one of the best practices in Open Source. With the new release, NNStreamer now clears all of the 19 detected security vulnerabilities from LGTM analysis since this month.
Key Features of 2.0.y LTS
With Edge-AI capabilities, users may connect independent and remote pipelines with NNStreamer-provided protocols designed for AI data streams. In other words, a lightweight IoT device may offload its AI workloads to its neighbor devices with GStreamer/NNStreamer pipeline descriptions, and an AI service may publish its output data streams for other AI pipelines easily. Several features and optimizations are scheduled to follow with subsequent version releases.
As new stream types are introduced with this release, the “single tensor” stream type is becoming obsolete in NNStreamer 2.0, however compatibility will remain with possible warning messages in subsequent releases. The single tensor stream type with “other/tensor” MIME will be obsolete with the NNStreamer 2.0 release. Users are recommended to use “other/tensors” instead. The standard tensor data streams may have different data types: static (default), dynamic, and sparse. The conventional and default tensor streams are static. A dynamic tensor stream may have different dimensions with each data frame. A sparse tensor stream assumes that most elements are zeros. As new stream types are introduced with this release, the “single tensor” stream type is becoming obsolete in NNStreamer 2.0, however compatibility will remain with possible warning messages in subsequent releases. The single tensor stream type with “other/tensor” MIME will be obsolete with the NNStreamer 2.0 release. Users are recommended to use “other/tensors” instead. The standard tensor data streams may have different data types: static (default), dynamic, and sparse. The conventional and default tensor streams are static. A dynamic tensor stream may have different dimensions with each data frame. A sparse tensor stream assumes that most elements are zeros.
Other major features include:
New hardware accelerators and AI frameworks supported: TVM, TensorRT, NNTrainer, Tensorflow-lite delegation (GPU/NNAPI/XNNPACK),
Tensor-converter/decoder support subplugins and custom functions.
New elements: stream branch (Tensor-if), stream join (Join), crop (Tensor-crop), rate and QoS control (Tensor-rate)
The session will cover anyone interested in understanding how Egeria brings value to your data and how you manage it; enabling data centric, metadata driven integration. The session will start with the core Egeria constructs including entities, and will explain the principles behind them. We will then go through the layers and aspects of the Egeria architecture, at each stage talking about the applicability to solving real world problems.
The session will explain the benefits of solving the main integration problem organizations face; the solution being to use the Egeria eco-system. It will then talk about how you can use this coherent view of the your information to add value and save time for your organization.
At the end of the session, you should have awareness of the parts of Egeria at a high level, why they have been implemented as they are and the value that each of the pieces bring.
Three pillars of AI lifecycle: Datasets, Models and Pipelines
In the AI lifecycle, we use data to build models for decision automation. Datasets, Models, and Pipelines (which take us from raw Datasets to deployed Models) become the three most critical pillars of the AI lifecycle. Due to the large number of steps that need to be worked on in the Data and AI lifecycle, the process of building a model can be bifurcated amongst various teams and large amounts of duplication can arise when creating similar Datasets, Features, Models, Pipelines, Pipeline tasks, etc. This also poses a strong challenge for traceability, governance, risk management, lineage tracking, and metadata collection.
Announcing Machine Learning eXchange (MLX)
To solve the problems mentioned above, we need a central repository where all the different asset types like Datasets, Models, and Pipelines are stored to be shared and reused across organizational boundaries. Having opinionated and tested Datasets, Models, and Pipelines with high quality checks, proper licenses, and lineage tracking increases the speed and efficiency of the AI lifecycle tremendously.
A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data acquisition, data preprocessing, data transformation, model training, and so on. A component is a block of code performing an atomic task and can be written in any programming language and using any framework.
Some sample pipeline components included in the MLX catalog:
MLX provides a collection of free, open source, state-of-the-art deep learning models for common application domains. The curated list includes deployable models that can be run as a microservice on Kubernetes or OpenShift and trainable models where users can provide their own data to train the models.
Jupyter notebook is an open-source web application that allows data scientists to create and share documents that Jupyter notebook is an open-source web application that allows data scientists to create and share documents that contain runnable code, equations, visualizations, and narrative text. MLX can run Jupyter notebooks as self-contained pipeline components by leveraging the Elyra-AI project.
Sample notebooks contained in the MLX catalog include:
Join us to build cloud-native AI Marketplace on Kubernetes
The Machine Learning Exchange provides a marketplace and platform for data scientists to share, run, and collaborate on their assets. You now can use it to host and collaborate on Data and AI assets within your team and across teams. Please join us on the Machine Learning eXchange github repo, try it out, give feedback, and raise issues. Additionally, you can connect with us via the following:
To contribute and build end to end Machine Learning Pipelines on OpenShift and Kubernetes, please join the Kubeflow Pipelines on Tekton project and reach out with any questions, comments, and feedback!
To deploy Machine Learning Models in production, check out the KFServing project.
Guest Authors: Utpal Mangla, VP & Senior Partner; Global Leader: IBM’s Telecom Media Entertainment Industry Center of Competency at IBM & Luca Marchi: AI Innovation, Center of Competence for Telco, Media and Entertainment, IBM & Kush Varshney: Distinguished Research Staff Member, Manager at IBM Thomas J. Watson Research Center & Shikhar Kwatra: Data&AI Architect, AI/ML Operationalization Leader at IBM& Mathews Thomas, Executive IT Architect, IBM
Effectively Governing AI
Artificial intelligence systems have become increasingly prevalent in everyday life and enterprise settings, and they’re now often being used to support human decision-making.
When we understand how a technology works and we can assess that it’s safe and reliable, we’re far more inclined to trust it. But even when we don’t understand the technology (do you understand how a modern automobile works?), if it has been tested and certified by a respectable body, we are inclined to trust it. Many AI systems today are black boxes, where data is fed in and results come out. To trust a decision made by an algorithm, we need to know that it is fair, that it’s reliable and can be accounted for, and that it will cause no harm. We need assurances that AI cannot be tampered with and that the system itself is secure. We need to be able to look inside AI systems, to understand the rationale behind the algorithmic outcome, and even ask it questions as to how it came to its decision.
Hence, enterprises creating such AI services are being challenged by an emerging problem: How to effectively govern the creation and deployment of these services. Enterprises want to understand and gain control over their current AI lifecycle processes, often motivated by internal policies or external regulation.
The AI lifecycle includes a variety of roles, performed by people with different specialized skills and knowledge that collectively produce an AI service. Each role contributes in a unique way, using different tools. Figure 1 specifies some common roles.
Figure 1: A common AI lifecycle involving different personas. Image taken from the AI FactSheets 360 website.
Data flows throughout this lifecycle, as raw input data, engineered features, model predictions, and performance metric results. Data governance relies on the overall management of data availability, relevancy, usability, integrity, and security in an enterprise. It helps organizations manage their information knowledge and answer questions, such as:
What data do we have?
What do we know about our information?
Where do different datasets come from?
Does this data adhere to company policies and rules?
What is the quality of our data?
Various enterprises are developing theoretical and algorithmic frameworks for generative AI to synthesize realistic, diverse, and targeted data. In order to increase the accountability of high-risk AI systems, we need to develop technologies to increase their end-to-end transparency and fairness.
Tools like AI Fairness 360, AI Explainability 360, Adversarial Robustness 360, and Uncertainty Quantification 360, which are open-source software toolkits that help users uncover and mitigate various biases in machine learning models that lead to bad or unequal performance. Tools and technologies being developed by AI enterprises must be adept at tracking and mitigating biases at multiple points along their machine learning pipeline, using the appropriate metric for their circumstances, and captured in transparent documentation such as an AI FactSheet. They should help an AI development team perform systematic checking for biases similar to checks for development bugs or security violations in a continuous integration pipeline.
Bringing together mitigation techniques appropriate for different points in the pipeline to address different biases (social, temporal, etc.) will help developers produce real-world deployments that are safe and secure.
500 Github Star Milestone, Edge-Device Support, CNN Implementations, Variable Types, MatMul Benchmarks, and Binary Optimisations
Kompute, an LF AI & Data Foundation Sandbox-Stage Project advancing the cross-vendor GPU acceleration ecosystem, has released version 0.8.0 which includes major milestones including reaching 500 github stars, edge-device extensions, convolutional neural network (CNN) implementations, variable data types and more. Kompute is a general purpose GPU compute framework for AI & Machine Learning applications which works across cross-vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). The Kompute framework provides a flexible interface that can be adopted by mobile, desktop, cloud and edge applications to enable highly optimizable GPU acceleration. The framework includes a high-level Python interface for advanced data processing use-cases, as well as an extensible low-level C++ interface that provides high performance device-specific optimizations.
The newly released 0.8.0 version of Kompute introduces major improvements to the general cross-platform compatibility and GPU acceleration features of Kompute. The high level summary of the highlights are as follows:
Milestone of 500 Github repo stars
Broader edge-device support with mesa-driver integration
We will also be giving a talk at CppCon2021 this year, so if you are around please drop by our talk and say hello, or feel free to ask any questions during the Q&A.
The Kompute repo reaches 500 Github stars
We are thrilled to see the fantastic growth and adoption of the Kompute Project, as well as the great discourse that it has continuously encouraged to further the cross-vendor GPU acceleration ecosystem. Today we celebrate the Kompute Project reaching 500 stars in Github, which is a major milestone following from Kompute’s one-year birthday last month. Github stars can become a shallow metric if that’s the only thing that is being used to calculate a project’s growth, so we will be keen to identify other metrics that allow us to ensure our community grows steadily, including number of contributors, contributions, community interactions in our discord, etc.
Broader edge-device support with mesa-driver integration
As part of our 0.8.0 release we have significantly extended edge-device support to hundreds of devices by supporting mesa-drivers as first-class components thanks to this great external contribution. We have added an official tutorial that showcases the integration with the mesa Broadcom drivers running in a Raspberry Pi, which can be adopted across other edge devices for GPU acceleration implementations.
We have introduced a high level example that provides an implementation of a convolutional neural network (CNN) that enables for image resolution upscaling, which means that images can improve their quality through purely the machine learning implementation. This is another fantastic external contribution from the great Kompute community.
This example showcases how to import a pre-trained deep learning model. We create the Kompute code that loads model weights, then we create Kompute logic that performs inference on image, and we run model against image to perform resolution upscale on any image.
Support for variable types across GPU parameters
By default, the simplified interfaces of Kompute will expose the ability to deal with float scalar types, which may be enough to get through the basic conceptual examples. However, as you develop real-world applications, more specialized types may be required for the different components that Kompute exposes to perform computation in the GPUs.
In version 0.8.0 of Kompute we introduce richer support for variable types across the Python and C++ interface that allow users to set different scalar values, and in some cases user-defined structs for their Kompute resources. More specifically, we have added support for multiple scalar types for the Kompute Tensor resource, multiple scalar type & arbitrary user-defined struct support for Kompute Push Constants, and multiple scalar types for Specialization Constants.
Semi-optimized Matrix Multiplication Kernel Benchmark example
In this release of Kompute we have received another great external contribution of an example that starts off with a naive implementation of a matrix multiplication algorithm, and then shows how to iteratively improve performance with high-level benchmarking techniques . This highlights how increasing matrix size can also increase the performance in GFLOPS in the specific optimizations introduced. The initial experimentation was based on the SGEMM in WebGL2-compute article on the public library ibiblio.org, and explores some initial improvements with basic and slightly more optimized tiling. This is still a work we would be interested to explore further and would be great to receive further contributions.
Significant reduction on 3rd party dependencies
The Kompute project has now been updated to reduce the 3rd party dependencies. This release removes some dependencies in favour of modularised functional utilities that are only used in the testing framework. This results in a staggering optimization of the binary, reducing the size by an order of magnitude, bringing the library binary from 15MB down to 1MB. This also simplifies cross-platform compatibility, as it requires less dependencies to build in different architectures.
The main dependency that has been removed is GLSLang, which was being used to provide a single function to perform online shader compilation primarily for the tests and simple examples. Instead we have now moved to allowing users to bring in their preferred method of performing compilation of shaders to SPIR-V, whilst still providing guidance on how Kompute users would be able to do it through simple methods.
Join the Kompute Project
The core objective of the Kompute project is to contribute to and further the GPU computing ecosystem across both, scientific and industry applications, through cross-vendor graphics card tooling and capabilities. We have seen very positive reception and adoption of Kompute across various development communities, including advanced data processing use-cases in mobile applications, game development engines, edge device and cloud, and we would love to engage with the broader community to hear thoughts on suggestions and improvements.
The Kompute Project invites you to adopt or upgrade to version 0.8.0 and welcomes feedback. For details on the additional features and improvements, please refer to the release notes here.
As mentioned previously, if you are interested to learn more, you can join us at our next GPU Acceleration call on September 28th at 9:00 EST / 13:00 UTC / 20:00 CST where we will be covering Kompute updates as well as general cross-vendor GPU Acceleration topics.
The LF AI & Data Foundation, which supports and builds a sustainable ecosystem for open source AI and Data software, today announced the launch of the DataOps Committee. The committee consists of a diverse group of organizations from various industries and individuals interested in collaborating on a set of practices that aims to deliver trusted and business-ready data to accelerate journeys to build AI powered applications. This committee was formally approved by the Technical Advisory Council and the Governing Board.
Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “We are very excited to expand our efforts in LF AI & Data into DataOps. Collaborating and advancing innovation in DataOps will have a direct impact on improving the quality and reducing the cycle time of data analytics. Our goal is to leverage the mindshare of the broader AI and Data community to further innovate in this area and create new collaboration opportunities for LF AI & Data hosted projects. The DataOps committee will be an open and collaborative space for anyone interested to join the effort and be part of our growing community”.
Based on the initial discussion among the founders of the committee, the initial focus of the committee for the first 6-8 months will evolve around:
Identify Projects and tools in DataOps Space and get the community exposed to how these DataOps tools work together and where to use in the pipeline (with pros and cons).
Exposure to industrial approaches for dataset metadata management, governance, and automation of flow.
Understand usage of DataOps tools and practices through industrial use cases (by domain). Identify gaps in the use case implementation and discuss solutions to bridge the gap.
Exposure to tools and technologies that can help control the usage of data and securely access it across the enterprise in a cloud native platform.
Provide an opportunity for committee members to perform research in the DataOps space.
Educate the community about new developments in the DataOps space.
Over time, we expect the focus areas to shift into a deeper technical focus with an emphasis on filling in the gaps in terms of the implementation of needed functionalities and launching technical efforts to provide bridging across various projects. As this is a member driven effort, we extend you the invitation to participate in the committee, contribute to the efforts, and influence it.
Saishruthi Swaminathan, Technical Lead & Data Scientist, IBM, said: “We are very happy to experience the support from the LF AI & Data membership for this effort that lead to the formalization of the DataOps Committee. We’re excited to be leading this effort with the goal to generate and support open standards around toolchain interoperability for DataOps”.
Learn more about the DataOps Committee here. To participate in the committee be sure to subscribe to the mailing list to stay up to date on activities and also subscribe to the group calendar for upcoming meetings.
For a full list of LF AI & Data Foundation Committees, visit our website. For questions about participating in the LF AI & Data Foundation, please email us at firstname.lastname@example.org.