Category

Blog

AI for Good Summit: Open Source, Accelerate AI Innovation Webinar – October 21 Virtual Event

By Blog, Uncategorized

The LF AI & Data Foundation is pleased to share that the Open Source, Accelerating AI Innovation webinar of ITU AI for Good Summit, will be held online on October 21, 2021. 

The “Open Source Accelerating AI Innovation” session will bring together several experts from the AI open source community to bring to light the state-of-the art breakthroughs in their open source exploration. It’s known that the development of open source technologies is the key driver of sustainable AI innovation and this webinar is a great opportunity for you to hear fantastic thoughts and opinions about why open source is so important to AI and how open source can power AI.

Registration is now open and the event is free to attend. Please see the agenda below and register right now.

LF AI & Data Resources

Thank You for a Great OSS+ELC 2021!

By Blog, Uncategorized
lfaidata-horizontal-color.png

The LF AI & Data Foundation would like to take this opportunity to thank all of the attendees, speakers/presenters, and booth staff volunteers for a great experience at OSS+ELC 2021. Of course none of this would be possible without the amazing Events Team at The Linux Foundation and all of the Program Committee members and their hard work. With the ongoing COVID-19 situation, we were thrilled to be able to participate in this hybrid onsite/virtual event. 

The AI & Data Track during the conference was successful, with a great showing of attendees and great subject matter experts as speakers. If you didn’t get a chance to visit our Bronze booth in the onsite or virtual exhibit hall, please take a moment to explore our resources below. 

Be sure to look for us at our next events, KubeCon+CloudNativeCon+OSS China (December 9th & 10th) and OSS Japan (December 14th & 15th) at our virtual booth. Thank you and we look forward to connecting again soon!

LF AI & Data Resources

LF AI & Data Day ONNX Community Virtual Meetup – October 2021

By Blog

The LF AI & Data Foundation is pleased to announce the upcoming LF AI & Data Day* – ONNX Community Virtual Meetup – October 2021, to be held via Zoom on October 21st.

ONNX, an LF AI & Data Foundation Graduated Project, is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. 

The virtual meetup will cover ONNX Community updates, partner/end-user stories, and SIG/WG updates. Check out the schedule of events here. If you are using ONNX in your services and applications, building software or hardware that supports ONNX, or contributing to ONNX, you should attend! This is a great opportunity to connect with and hear from people working with ONNX across many companies. 

Registration is now open and the event is free to attend. Capacity will be 500 attendees. For up to date information on this virtual meetup, please visit the event website

Want to get involved with ONNX? Be sure to join the ONNX-Announce mailing list to join the community and stay connected on the latest updates. You can join technical discussions on GitHub and more conversations with the community on LF AI & Data Slack’s ONNX channels.

Note: In order to ensure the safety of our event participants and staff due to the Novel Coronavirus situation (COVID-19) the ONNX Steering Committee decided to make this a virtual-only event via Zoom.

*LF AI & Data Day is a regional, one-day event hosted and organized by local members with support from LF AI & Data and its Projects. Learn more about the LF AI & Data Foundation here.

ONNX Key Links

LF AI & Data Resources

NNStreamer 2.0 Release Now Available

By Blog

LF AI & Data Foundation is proud to support the release of NNStreamer 2.0, one of our incubation-stage projects. NNStreamer is a set of Gstreamer plugins that support ease and efficiency for Gstreamer developers adopting neural network models and neural network developers managing neural network pipelines and their filters. Features of the new release include Edge-AI capabilities and new stream types. With this new release, NNStreamer 2.0 achieved the CII Best Practices passing badge last month. This recognizes that the community maintains high-quality code, documentation, testing, and of course, a high level of security as one of the best practices in Open Source. With the new release, NNStreamer now clears all of  the 19 detected security vulnerabilities from LGTM analysis since this month.

Key Features of 2.0.y LTS 

With Edge-AI capabilities, users may connect independent and remote pipelines with NNStreamer-provided protocols designed for AI data streams. In other words, a lightweight IoT device may offload its AI workloads to its neighbor devices with GStreamer/NNStreamer pipeline descriptions, and an AI service may publish its output data streams for other AI pipelines easily. Several features and optimizations are scheduled to follow with subsequent version releases.

As new stream types are introduced with this release, the “single tensor” stream type is becoming obsolete in NNStreamer 2.0, however compatibility will remain with possible warning messages in subsequent releases. The single tensor stream type with “other/tensor” MIME will be obsolete with the NNStreamer 2.0 release. Users are recommended to use “other/tensors” instead. The standard tensor data streams may have different data types: static (default), dynamic, and sparse. The conventional and default tensor streams are static. A dynamic tensor stream may have different dimensions with each data frame. A sparse tensor stream assumes that most elements are zeros.  As new stream types are introduced with this release, the “single tensor” stream type is becoming obsolete in NNStreamer 2.0, however compatibility will remain with possible warning messages in subsequent releases. The single tensor stream type with “other/tensor” MIME will be obsolete with the NNStreamer 2.0 release. Users are recommended to use “other/tensors” instead. The standard tensor data streams may have different data types: static (default), dynamic, and sparse. The conventional and default tensor streams are static. A dynamic tensor stream may have different dimensions with each data frame. A sparse tensor stream assumes that most elements are zeros.  

  Other major features include:

  • New hardware accelerators and AI frameworks supported: TVM, TensorRT, NNTrainer, Tensorflow-lite delegation (GPU/NNAPI/XNNPACK), 
  • Tensor-converter/decoder support subplugins and custom functions.
  • New elements: stream branch (Tensor-if), stream join (Join), crop (Tensor-crop), rate and QoS control (Tensor-rate)

NNStreamer Key Links

LF AI & Data Resources

The Value Egeria Brings to an Organization

By Blog

Guest Author : Mandy Chessell

Update your calendars! The popular monthly Egeria Webinar program continues. The next session is on the 4th of October 2021 at 15:00 UTC via Zoom and will focus on understanding the value of Egeria.

The session will cover anyone interested in understanding how Egeria brings value to your data and how you manage it; enabling data centric, metadata driven integration. The session will start with the core Egeria constructs including entities, and will explain the principles behind them. We will then go through the layers and aspects of the Egeria architecture, at each stage talking about the applicability to solving real world problems.

The session will explain the benefits of solving the main integration problem organizations face; the solution being to use the Egeria eco-system. It will then talk about how you can use this coherent view of the your information to add value and save time for your organization.   

At the end of the session, you should have awareness of the parts of Egeria at a high level, why they have been implemented as they are and the value that each of the pieces bring.   

Be sure to put the other Webinar dates in your calendar

Egeria Key Links

LF AI & Data Resources

Machine Learning eXchange (MLX): One stop shop for Trusted Data and AI artifacts

By Blog

By:  Animesh Singh, Christian Kadner, Tommy Chaoping Li

Three pillars of AI lifecycle: Datasets, Models and Pipelines

In the AI lifecycle, we use data to build models for decision automation. Datasets, Models, and Pipelines (which take us from raw Datasets to deployed Models) become the three most critical pillars of the AI lifecycle. Due to the large number of steps that need to be worked on in the Data and AI lifecycle, the process of building a model can be bifurcated amongst various teams and large amounts of duplication can arise when creating similar Datasets, Features, Models, Pipelines, Pipeline tasks, etc. This also poses a strong challenge for traceability, governance, risk management, lineage tracking, and metadata collection. 

Announcing Machine Learning eXchange (MLX)

To solve the problems mentioned above, we need a central repository where all the different asset types like Datasets, Models, and Pipelines are stored to  be shared and reused across organizational boundaries. Having opinionated and tested Datasets, Models, and Pipelines with high quality checks, proper licenses, and lineage tracking increases the speed and efficiency of the AI lifecycle tremendously. 

To solve the above challenges, IBM and Linux Foundation AI and Data(LFAI and Data) are joining hands to announce Machine Learning eXchange (MLX), a Data and AI Asset Catalog and Execution Engine, in Open Source and Open Governance. 

Machine Learning eXchange (MLX) allows upload, registration, execution, and deployment of:

  • AI pipelines and pipeline components
  • Models
  • Datasets
  • Notebooks

MLX Architecture

MLX provides:

  • Automated sample pipeline code generation to execute registered models, datasets, and notebooks
  • Pipelines engine powered by Kubeflow Pipelines on Tekton, the core of Watson Studio Pipelines
  • Registry for Kubeflow Pipeline Components
  • Dataset management by Datashim
  • Serving engine by KFServing

MLX Katalog Assets

Pipelines

In machine learning, it is common to run a sequence of tasks to process and learn from data, all of which can be packaged into a pipeline.

ML Pipelines are:

  • A consistent way for collaborating on data science projects across team and organization boundaries
  • A collection of coarse grained tasks encapsulated as pipeline components to be snapped together like lego bricks
  • A one-stop shop for people interested in training, validating, deploying, and monitoring AI models

Some sample Pipelines included in the MLX catalog:

Pipeline Components

A pipeline component is a self-contained set of code that performs one step in the ML workflow (pipeline), such as data acquisition, data preprocessing, data transformation, model training, and so on. A component is a block of code performing an atomic task and can be written in any programming language and using any framework.

Some sample pipeline components included in the MLX catalog:

Models

MLX provides a collection of free, open source, state-of-the-art deep learning models for common application domains. The curated list includes deployable models that can be run as a microservice on Kubernetes or OpenShift and trainable models where users can provide their own data to train the models.

Some sample models included in the MLX catalog:

Datasets

The MLX catalog contains reusable datasets and leverages Datashim to make the datasets available to other MLX assets like notebooks, models, and pipelines in the form of Kubernetes volumes.

Sample datasets contained in the MLX catalog include:

Notebooks

Jupyter notebook is an open-source web application that allows data scientists to create and share documents that Jupyter notebook is an open-source web application that allows data scientists to create and share documents that contain runnable code, equations, visualizations, and narrative text. MLX can run Jupyter notebooks as self-contained pipeline components by leveraging the Elyra-AI project.

Sample notebooks contained in the MLX catalog include:

Join us to build cloud-native AI Marketplace on Kubernetes

The Machine Learning Exchange provides a marketplace and platform for data scientists to share, run, and collaborate on their assets. You now can use it to host and collaborate on Data and AI assets within your team and across teams. Please join us on the Machine Learning eXchange github repo, try it out, give feedback, and raise issues. Additionally, you can connect with us via the following:

  • To contribute and build end to end Machine Learning Pipelines on OpenShift and Kubernetes, please join the Kubeflow Pipelines on Tekton project and reach out with any questions, comments, and feedback!
  • To deploy Machine Learning Models in production, check out the KFServing project.

MLX Key Links


Thank You

Thanks to the many contributors of Machine Learning Exchange, mainly

  • Andrew Butler
  • Animesh Singh
  • Christian Kadner
  • Ibrahim Haddad
  • Karthik Muthuraman
  • Patrick Titzler
  • Romeo Kienzler
  • Saishruthi Swaminathan
  • Srishti Pithadia
  • Tommy Chaopling Li
  • Yihong Wang

LF AI & Data Resources

AI Governance: Gain Control Over the AI Lifecycle

By Blog

Guest Authors: Utpal Mangla, VP & Senior Partner; Global Leader: IBM’s Telecom Media Entertainment Industry Center of Competency at IBM & Luca Marchi: AI Innovation, Center of Competence for Telco, Media and Entertainment, IBM & Kush Varshney: Distinguished Research Staff Member, Manager at IBM Thomas J. Watson Research Center & Shikhar Kwatra: Data&AI Architect, AI/ML Operationalization Leader at IBM & Mathews Thomas, Executive IT Architect, IBM

 

Effectively Governing AI

Artificial intelligence systems have become increasingly prevalent in everyday life and enterprise settings, and they’re now often being used to support human decision-making. 

When we understand how a technology works and we can assess that it’s safe and reliable, we’re far more inclined to trust it. But even when we don’t understand the technology (do you understand how a modern automobile works?), if it has been tested and certified by a respectable body, we are inclined to trust it. Many AI systems today are black boxes, where data is fed in and results come out. To trust a decision made by an algorithm, we need to know that it is fair, that it’s reliable and can be accounted for, and that it will cause no harm. We need assurances that AI cannot be tampered with and that the system itself is secure. We need to be able to look inside AI systems, to understand the rationale behind the algorithmic outcome, and even ask it questions as to how it came to its decision.

Hence, enterprises creating such AI services are being challenged by an emerging problem: How to effectively govern the creation and deployment of these services. Enterprises want to understand and gain control over their current AI lifecycle processes, often motivated by internal policies or external regulation.

The AI lifecycle includes a variety of roles, performed by people with different specialized skills and knowledge that collectively produce an AI service. Each role contributes in a unique way, using different tools. Figure 1 specifies some common roles.

Roles

Figure 1: A common AI lifecycle involving different personas. Image taken from the AI FactSheets 360 website.

Data flows throughout this lifecycle, as raw input data, engineered features, model predictions, and performance metric results. Data governance relies on the overall management of data availability, relevancy, usability, integrity, and security in an enterprise. It helps organizations manage their information knowledge and answer questions, such as:

  • What data do we have?
  • What do we know about our information?
  • Where do different datasets come from?
  • Does this data adhere to company policies and rules?
  • What is the quality of our data?

Various enterprises are developing theoretical and algorithmic frameworks for generative AI to synthesize realistic, diverse, and targeted data. In order to increase the accountability of high-risk AI systems, we need to develop technologies to increase their end-to-end transparency and fairness.

Tools like AI Fairness 360, AI Explainability 360, Adversarial Robustness 360, and Uncertainty Quantification 360, which are open-source software toolkits that help users uncover and mitigate various biases in machine learning models that lead to bad or unequal performance. Tools and technologies being developed by AI enterprises must be adept at tracking and mitigating biases at multiple points along their machine learning pipeline, using the appropriate metric for their circumstances, and captured in transparent documentation such as an AI FactSheet. They should help an AI development team perform systematic checking for biases similar to checks for development bugs or security violations in a continuous integration pipeline.

Bringing together mitigation techniques appropriate for different points in the pipeline to address different biases (social, temporal, etc.) will help developers produce real-world deployments that are safe and secure. 

LF AI & Data Resources

 

Kompute Releases v0.8.0 to Continue Advancing Cross-Vendor GPU Acceleration

By Blog

500 Github Star Milestone, Edge-Device Support, CNN Implementations, Variable Types, MatMul Benchmarks, and Binary Optimisations

Kompute, an LF AI & Data Foundation Sandbox-Stage Project advancing the cross-vendor GPU acceleration ecosystem, has released version 0.8.0 which includes major milestones including reaching 500 github stars, edge-device extensions, convolutional neural network (CNN) implementations, variable data types and more. Kompute is a general purpose GPU compute framework for AI & Machine Learning applications which works across cross-vendor graphics cards (AMD, Qualcomm, NVIDIA & friends). The Kompute framework provides a flexible interface that can be adopted by mobile, desktop, cloud and edge applications to enable highly optimizable GPU acceleration. The framework includes a high-level Python interface for advanced data processing use-cases, as well as an extensible low-level C++ interface that provides high performance device-specific optimizations.

The newly released 0.8.0 version of Kompute introduces major improvements to the general cross-platform compatibility and GPU acceleration features of Kompute. The high level summary of the highlights are as follows:

  • Milestone of 500 Github repo stars
  • Broader edge-device support with mesa-driver integration
  • Convolutional Neural Network (CNN) Implementations
  • Support for variable types across GPU parameters
  • Semi-optimized Matrix Multiplication Kernel Benchmark implementation
  • Significant reduction on 3rd party dependencies (15mb->~1mb binary)

If you are interested to learn more, you can join us at our next “GPU Acceleration” monthly call on September 28th at 9:00 EST / 13:00 UTC / 20:00 CST where we will be covering Kompute updates as well as general cross-vendor GPU Acceleration topics.

We will also be giving a talk at CppCon2021 this year, so if you are around please drop by our talk and say hello, or feel free to ask any questions during the Q&A.

The Kompute repo reaches 500 Github stars

We are thrilled to see the fantastic growth and adoption of the Kompute Project, as well as the great discourse that it has continuously encouraged to further the cross-vendor GPU acceleration ecosystem. Today we celebrate the Kompute Project reaching 500 stars in Github, which is a major milestone following from Kompute’s one-year birthday last month. Github stars can become a shallow metric if that’s the only thing that is being used to calculate a project’s growth, so we will be keen to identify other metrics that allow us to ensure our community grows steadily, including number of contributors, contributions, community interactions in our discord, etc.

Broader edge-device support with mesa-driver integration

As part of our 0.8.0 release we have significantly extended edge-device support to hundreds of devices by supporting mesa-drivers as first-class components thanks to this great external contribution. We have added an official tutorial that showcases the integration with the mesa Broadcom drivers running in a Raspberry Pi, which can be adopted across other edge devices for GPU acceleration implementations.

This is a fantastic addition as it showcases the flexibility of the capabilities of Kompute. The example required advanced GPU computing concepts to address some short-comings of limited hardware, such as the need to expose means to add GPU extensions explicitly, as well as adding flexible memory barriers operations that can be used to ensure consistency in more limited devices with non-coherent GPU memory.

Convolutional Neural Network (CNN) Implementations

We have introduced a high level example that provides an implementation of a convolutional neural network (CNN) that enables for image resolution upscaling, which means that images can improve their quality through purely the machine learning implementation. This is another fantastic external contribution from the great Kompute community.

This example showcases how to import a pre-trained deep learning model. We create the Kompute code that loads model weights, then we create Kompute logic that performs inference on image, and we run model against image to perform resolution upscale on any image.

Small imageVGG7 InferenceLarger Image

Support for variable types across GPU parameters

By default, the simplified interfaces of Kompute will expose the ability to deal with float scalar types, which may be enough to get through the basic conceptual examples. However, as you develop real-world applications, more specialized types may be required for the different components that Kompute exposes to perform computation in the GPUs.

In version 0.8.0 of Kompute we introduce richer support for variable types across the Python and C++ interface that allow users to set different scalar values, and in some cases user-defined structs for their Kompute resources. More specifically, we have added support for multiple scalar types for the Kompute Tensor resource, multiple scalar type & arbitrary user-defined struct support for Kompute Push Constants, and multiple scalar types for Specialization Constants.

Semi-optimized Matrix Multiplication Kernel Benchmark example

In this release of Kompute we have received another great external contribution of an example that starts off with a naive implementation of a matrix multiplication algorithm, and then shows how to iteratively improve performance with high-level benchmarking techniques . This highlights how increasing matrix size can also increase the performance in GFLOPS in the specific optimizations introduced. The initial experimentation was based on the SGEMM in WebGL2-compute article on the public library ibiblio.org, and explores some initial improvements with basic and slightly more optimized tiling. This is still a work we would be interested to explore further and would be great to receive further contributions.

Significant reduction on 3rd party dependencies

The Kompute project has now been updated to reduce the 3rd party dependencies. This release removes some dependencies in favour of modularised functional utilities that are only used in the testing framework. This results in a staggering optimization of the binary, reducing the size by an order of magnitude, bringing the library binary from 15MB down to 1MB. This also simplifies cross-platform compatibility, as it requires less dependencies to build in different architectures.

The main dependency that has been removed is GLSLang, which was being used to provide a single function to perform online shader compilation primarily for the tests and simple examples. Instead we have now moved to allowing users to bring in their preferred method of performing compilation of shaders to SPIR-V, whilst still providing guidance on how Kompute users would be able to do it through simple methods.

Join the Kompute Project

The core objective of the Kompute project is to contribute to and further the GPU computing ecosystem across both, scientific and industry applications, through cross-vendor graphics card tooling and capabilities. We have seen very positive reception and adoption of Kompute across various development communities, including advanced data processing use-cases in mobile applications, game development engines, edge device and cloud, and we would love to engage with the broader community to hear thoughts on suggestions and improvements.

The Kompute Project invites you to adopt or upgrade to version 0.8.0 and welcomes feedback. For details on the additional features and improvements, please refer to the release notes here

As mentioned previously, if you are interested to learn more, you can join us at our next GPU Acceleration call on September 28th at 9:00 EST / 13:00 UTC / 20:00 CST where we will be covering Kompute updates as well as general cross-vendor GPU Acceleration topics.

Kompute Key Links

LF AI & Data Resources

LF AI & Data Foundation Announces DataOps Committee

By Blog

The LF AI & Data Foundation, which supports and builds a sustainable ecosystem for open source AI and Data software, today announced the launch of the DataOps Committee. The committee consists of a diverse group of organizations from various industries and individuals interested in collaborating on a set of practices that aims to deliver trusted and business-ready data to accelerate journeys to build AI powered applications. This committee was formally approved by the Technical Advisory Council and the Governing Board.

Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “We are very excited to expand our efforts in LF AI & Data into DataOps. Collaborating and advancing innovation in DataOps will have a direct impact on improving the quality and reducing the cycle time of data analytics. Our goal is to leverage the mindshare of the broader AI and Data community to further innovate in this area and create new collaboration opportunities for LF AI & Data hosted projects. The DataOps committee will be an open and collaborative space for anyone interested to join the effort and be part of our growing community”. 

Based on the initial discussion among the founders of the committee, the initial focus of the committee for the first 6-8 months will evolve around:

  1. Identify Projects and tools in DataOps Space and get the community exposed to how these DataOps tools work together and where to use in the pipeline (with pros and cons).
  2. Exposure to industrial approaches for dataset metadata management, governance, and automation of flow.
  3. Understand usage of DataOps tools and practices through industrial use cases (by domain). Identify gaps in the use case implementation and discuss solutions to bridge the gap.
  4. Exposure to tools and technologies that can help control the usage of data and securely access it across the enterprise in a cloud native platform.
  5. Provide an opportunity for committee members to perform research in the DataOps space.
  6. Educate the community about new developments in the DataOps space.

Over time, we expect the focus areas to shift into a deeper technical focus with an emphasis on filling in the gaps in terms of the implementation of needed functionalities and launching technical efforts to provide bridging across various projects. As this is a member driven effort, we extend you the invitation to participate in the committee, contribute to the efforts, and influence it.

Saishruthi Swaminathan, Technical Lead & Data Scientist, IBM, said: “We are very happy to experience the support from the LF AI & Data membership for this effort that lead to the formalization of the DataOps Committee. We’re excited to be leading this effort with the goal to generate and support open standards around toolchain interoperability for DataOps”. 

Learn more about the DataOps Committee here. To participate in the committee be sure to subscribe to the mailing list to stay up to date on activities and also subscribe to the group calendar for upcoming meetings.

For a full list of LF AI & Data Foundation Committees, visit our website. For questions about participating in the LF AI & Data Foundation, please email us at info@lfaidata.foundation.

DataOps Committee Key Links

LF AI & Data Resources

Egeria Webinar: Visualising a Metadata Ecosystem, 13th September 2021

By Blog
Guest Author: David Radley

IMPORTANT UPDATE: The date for this Webinar has changed. It is now schedule for Monday, September 13, 2021 at 15:00 UTC.

Update your calendars! The popular monthly Egeria Webinar program is restarting on September the 13th, 2021. Full program details are here: https://wiki.lfaidata.foundation/display/EG/Egeria+Webinar+program.

The next session is on the 13th of September 2021 at 15:00 UTC and is about visualising a metadata ecosystem. The session will cover:

  • An overview of the open types in Egeria and how they facilitate integration between sources of metadata without having one central metadata repository. 
  • Understanding the types is important knowledge when developing connectors and new APIs like OMAS’s.
  • The call will look at the concepts exposed in a higher level API – to compare them with the low level open types.
  • The session will show the visualisations Egeria has around the types so you can explore how they relate to each other.

Example visualisation

At the end of the session, 

  • You should have a good grasp of the Egeria open types and why they are so important and how to explore them visually. 
  • You want to explore the benefits of connecting your metadata sources into Egeria by mapping your types to the open types. 

Be sure to put the other Webinar dates in your calendar

Egeria Key Links

LF AI & Data Resources