All Posts By

Carly Driggers

LF AI & Data Day EU Virtual – June 10, 2021

By Blog

Orange and the LF AI & Data Foundation are pleased to announce the upcoming LF AI & Data Day* EU Virtual, to be held via Zoom on June 10, 2021.

This virtual event will feature keynote speakers from leading AI industries with a focus on open source strategies for scaling machine learning and deep learning. The event schedule will cover various AI topics including, technical presentations on MLOps, Trusted AI, and new LF AI & Data projects.

Registration is now open and the event is free to attend. The capacity will be 100 attendees. Please see the schedule below and also visit the event website for up-to-date information.

Thursday, June 10, 1:00 – 5:00 PM CEST

1:00- 1:10

Setup video conferencing 

Welcome Message & Agenda, Orange 


5 Breakthroughs to scale ML beyond the limits of experiments, Jamil Chawki PhD, AI Program Director & Co-founder of Orange AI Marketplace, Orange Internal Networks Infrastructure & Services | Chair LF AI & Data Outreach Committee 


Open source for Advancing Education in AI Technology & Business, Tawhid Chtioui, PhD, President-founder & Dean, AIvancity School for Technology, Business & Society 


LF AI & Data Updates and the Road Ahead, Ibrahim Haddad PhD, Executive Director, LF AI & Data Foundation   


A visual and scalable DL component library for Trusted AI using  AI Explainability & Fairness 360, Adversarial Robustness Toolkit and Elyra AI pipeline editor, Romeo Kienzler, CTO and Chief Data Scientist, STSM | IBM Centre for Open Source Data and AI Technologies


Trusted AI principles RREPEATS, François Jezequel Business Development, Orange Fab France | LF AI board member & Souad Ouali, Head of inter-operators Relationships’, Orange




RosaeNLG, an LF AI & Data Sandbox project on Natural Language Generation, Ludan Stoecklé, Founder of RosaeNLG


Accelerating AI maturity with MLOps, François Tillerot, Data-AI Product Business Owner & Co-founder of Orange AI Marketplace, Orange Business Services


Datashim, an LF AI & Data project to accelerate data access for Kubernetes/Openshift workloads, Yiannis Gkoufas, Research Software Engineer, IBM Research


Why ONNX Runtime matters for deploying AI in Institution, Xavier Tao, Data Engineer, Banque De France


Closing Session

Event host, Orange, is a leading telecommunications company with headquarters in France. They are the largest telecoms operator in France, with the bulk of their operations in Europe, Africa and the Middle East.

As an LF AI & Data General Member, Orange is involved with the LF AI & Data Governing Board, Outreach Committee, Trusted AI Committee, and is an active contributor to the LF AI & Data Acumos project.  

Note: In order to ensure the safety of our event participants and staff due to the Novel Coronavirus situation (COVID-19), the event hosts have decided to make this a virtual-only event via Zoom.

*LF AI & Data Day is a regional, one-day event hosted and organized by local members with support from LF AI & Data and its Projects. Learn more about the LF AI & Data Foundation here.

LF AI & Data Resources

What You Cannot Miss in Any AI Implementation: Fairness

By Blog

Guest Authors: Utpal Mangla, VP & Senior Partner; Global Leader: IBM’s Telecom Media Entertainment Industry Center of Competency at IBM, & Luca Marchi, AI Innovation, Center of Competence for Telco, Media and Entertainment, IBM, & Kush Varshney, Distinguished Research Staff Member, Manager at IBM Thomas J. Watson Research Center, & Shikhar Kwatra, Data&AI Architect, AI/ML Operationalization Leader at IBM

AI Fairness

Artificial Intelligence (AI) is becoming a key cog in how the world works and how it lives. But the reality is that AI is not as widespread in critical enterprise workflows as it could be because it is not perceived to be safe, reliable, fair, and trustworthy. With increasing regulation, concern about brand reputation, burgeoning complexity, and a renewed focus on social justice, companies are not ready and willing to deploy a “science experiment” at scale in their operations. As Thomas J. Watson, Sr., an early chief executive of IBM said, “The toughest thing about the power of trust is that it’s very difficult to build.”

We’ve seen many newsworthy examples of AI producing unfair outcomes: blacks being discriminated against in criminal recidivism, low-income students systematically having low “predicted” exam scores when the coronavirus pandemic cancelled the real exam, men and women having different lending decisions despite having exactly the same assets, and many more. Why is this happening and what can we do about it?

Lessons from Commercial Aviation

It is instructive to look at the history of commercial aviation to understand what is happening with AI today. The first flights by the Wright brothers and Santos-Dumont during 1903-1906 to the introduction of the commercial jetliner, the Boeing 707, in 1957 can be considered as the first 50 years of aviation. This period was all about just understanding how to make planes fly with limited commercialization. In the second 50 years of aviation that followed, the fundamental nature of airplanes did not change—today’s commercial jets are basically the same as the Boeing 707—but there was a heavy emphasis on safety, efficiency, and automation. Now commercial airlines operate almost everywhere with safety records hundreds of times better than fifty years ago.

What is the lesson for AI? We are just at the beginning of the second 50 years of AI. We can trace the beginnings of AI to a 1956 conference at Dartmouth. We can say that the first 50 years concluded when deep learning won the ImageNet competition in 2012. Just like in aviation, the first 50 years were spent on getting AI to simply work—to be competent and accurate at narrow tasks—with limited commercialization. Now our job is to work on making AI more safe, reliable, fair, trustworthy, efficient, and automated, and bring commercialization everywhere.

Accuracy Isn’t All You Need

To make AI trustworthy, we need it to be more than accurate. We need it to be fair so that it doesn’t discriminate against certain groups and individuals based on their race, gender, or other protected social attributes. We need it to be reliable and robust so that it can be used in different settings and contexts without spectacularly falling apart. We need it to be explainable or interpretable so that people can understand how AI makes its predictions. We need it to realize when it is unsure.

The LF AI & Data Foundation’s three open source toolkits: AI Fairness 360, AI Explainability 360, and Adversarial Robustness 360 Toolbox, are means for practicing data scientists to address these needs for making AI more trustworthy. Let’s dig into fairness in more detail. Where do the problems come from and how can we mitigate them?

Where Do Fairness Issues Come From?

AI, specifically machine learning, tends to reflect back and sometimes amplify unwanted biases that are already present in society. There are four main reasons why there can be unfairness in AI:

  1. Problem misspecification – when the problem owner and data scientist pose the problem they are going to be creating a solution for, they may make choices that introduce unwanted behaviors. For example, if they want to predict whether someone will commit a crime in the future, but they design an AI system to predict whether someone will be arrested in the future, they can introduce unfairness. First, being arrested does not imply that a person is guilty of a crime. Second, there are more arrests made in neighborhoods where police patrol more often, and that is not done equally.
  2. Features containing social biases – some attributes in a dataset already contain traces of structural biases that provide systematic disadvantage to certain groups. For example, the SAT score may be used as a feature for predicting an applicant’s success in college, but it is known to already contain biases so that some minority groups do worse because of cultural knowledge embedded in the questions.
  3. Sampling biases – sometimes datasets overrepresent privileged groups and underrepresent unprivileged groups. For example, face attribute classification datasets are known to be skewed towards white males.
  4. Data preparation – one key step in AI development pipelines is feature engineering, where raw data is transformed before being fed to the AI. There are several subjective choices made in this process, some of which can lead to unfairness

Measuring and Mitigating Unfairness

Just as there are many reasons why AI can yield unfairness, there are many ways to measure it and mitigate it. Choosing how to measure unfairness is not as easy as it sounds because different fairness metrics encode different worldviews and politics. As one option, you can measure the difference in selection rates of an AI, say the difference between the fraction of black applicants who got accepted to a college and the fraction of white applicants. As a different option, you can measure the difference in accuracy rates between the same two groups. They both sound about the same at face value but are actually quite different. In the first option, you are implicitly assuming that features have social biases (like the SAT score), but in the second option, you assume that all the unfairness is due to other reasons like sampling biases.

If you measure that an AI system is behaving unfairly, what can you do about it? You can apply one of many possible bias mitigation algorithms. The basic idea of bias mitigation is that you want a sort of statistical independence between protected attributes like ethnicity or gender and the predicted outcome like success in college. Statistical independence is the notion that two dimensions are unlinked and have no relationship with each other. There are many statistical methods that encourage independence, but that is a longer discussion for another day. Feel free to check out the AI Fairness 360 documentation for more details about bias mitigation if you can’t wait!

LF AI & Data Resources

Milvus 1.1 Release Now Available!

By Blog

Milvus, an LF AI & Data Foundation Incubation Project, has released version 1.1! Milvus is an open source vector database that is highly flexible, reliable, and blazing fast. It supports adding, deleting, updating, and near real-time search of vectors on a trillion-byte scale.

In version 1.1, Milvus adds a variety of improvements. Highlights include:


  • #4756 Improves the performance of the get_entity_by_id() method call.
  • #4856 Upgrades hnswlib to v0.5.0.
  • #4958 Improves the performance of IVF index training.

New Features

  • #4564 Supports specifying partition in a get_entity_by_id() method call.
  • #4806 Supports specifying partition in a delete_entity_by_id() method call.
  • #4905 Adds the release_collection() method, which unloads a specific collection from cache.

Fixed Issues

  • #4778 Fails to access vector index in Mishards.
  • #4797 The system returns false results after merging search requests with different topK parameters.
  • #4838 The server does not respond immediately to an index building request on an empty collection.
  • #4858 For GPU-enabled Milvus, the system crashes on a search request with a large topK (> 2048).
  • #4862 A read-only node merges segments during startup.
  • #4894 The capacity of a Bloom filter does not equal to the row count of the segment it belongs to.
  • #4908 The GPU cache is not cleaned up after a collection is dropped.
  • #4933 It takes a long while for the system to build index for a small segment.
  • #4952 Fails to set timezone as “UTC + 5:30”.
  • #5008 The system crashes randomly during continuous, concurrent delete, insert, and search operations.
  • #5010 For GPU-enabled Milvus, query fails on IVF_PQ if nbits ≠ 8.
  • #5050 get_collection_stats() returns false index type for segments still in the process of index building.
  • #5063 The system crashes when an empty segment is flushed.
  • #5078 For GPU-enabled Milvus, the system crashes when creating an IVF index on vectors of 2048, 4096, or 8192 dimensions.

As usual, there is strong support for Milvus from our fantastic open source community! We thank the following individuals for making their pull request part of Milvus 1.1:

To learn more about the Milvus 1.1 release, check out the full release notes. Want to get involved with Milvus? Be sure to join the Milvus-Announce and Milvus-Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the Milvus team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

Milvus Key Links

LF AI & Data Resources

Trusted AI Principles – RREPEATS Practical Examples Review

By Blog

Guest Author: Susan Malaika, LF AI & Data Trusted AI Committee Member

On April 28, 2021, the LF AI & Data Trusted AI Principles Working Group hosted a session about applying the eight RREPEATS principles to two examples; one in the context of network providers, the other in banking. The Practical Examples session is a follow-up from the session in February that introduced the RREPEATS principles

After the presentation of the two examples, a round table discussion took place, covering the application of trusted AI principles in companies. A key outcome from the discussion was the recognition that low-level tactical tools can be used to implement trusted AI principles. Another outcome was the importance of having an ethics board in corporations.

Example: Classification of Encrypted Traffic Application

Iman Akbari Azirani and Noura Limam from the University of Waterloo in Canada, and Bertrand Mathieu from Orange Labs in France, explained that the Internet data payloads and, increasingly, headers are encrypted. Typically, 85% of Internet traffic is encrypted, and Google services are 100% encrypted, making it difficult to classify network traffic. In order to anticipate workloads and offer good services to customers, network operators need to understand the traffic by:

  • Providing accurate capacity planning
  • Detecting fraud, such as, attempting to mask data that would normally be paid for, as free traffic. A common example of fraud is transmitting video over a text service.

Bertrand, Noura, and Iman discussed three of the RREPEATS principles in the context of the encrypted traffic application; Equitability, Privacy, and Explainability.

To ensure equitability and privacy, significant amounts of information (e.g., temporal information, IP addresses, payloads, security certificates etc.) are removed from the data. For explainability, it is important to understand that classification is applied at the global level (service and application) and not at an individual level. Deep neural networks are used in classification and focus on three types of features (3 faceted models): Transport Layer Security (TLS) handshake bytes, traffic shape (size, direction, inter arrival times), and statistical features. In spite of being focused on protocol agnostic features, the classification accuracy of this approach is high.

This section ended with request for tools that:

  • Help with network classification 
  • Support efficient/speedier models running in the network
  • Offer better explanations 
  • Prevent attacks on model 

Example: RosaeNLG Framework (an LF AI & Data project)

Ludan Stoecklé, CTO of Data & AI Lab BNP Paribas CIB and initial author of RosaeNLG, outlined how to explain a decision to a non-expert user, focused on trusting the decision and being transparent. Apparently, it has been shown that non-expert users most often find textual explanations easier to understand rather than the output from corporate style dashboards, such as tables and graphs. Also, computer generated texts are often preferred over human written text, as the generated texts are clearer, less ambiguous, and more concise.

Ludan illustrated his points via credit application rejection and showed simple text generated via Natural Language Generation to explain the reason for rejection. The following considerations apply in preparing an explanation for a decision:

  • Interpret the decision
  • Define what to say
  • Define how to say it 

The RosaeNLG framework (an LF AI & Data Sandbox project) provides template-based Natural Language Generation. It automates the production of relatively repetitive texts based on structured input data and textual templates, and is widely used in the financial industry.

Dataset Discussion 

There was an important discussion about open datasets in the areas of network encryption and NLG, and the absence of open networking datasets that could move the field forward. Typically networking datasets are proprietary and complex. It is difficult to create a synthetic dataset that mimics real network traces, with many sessions from many users and with variable network conditions that end-users experience. However, Orange is investigating the possibility of opening a part of a completely anonymized dataset (and without any possibility to reverse engineer it and discover private information).

Round Table – Applying Trusted AI Principles in a Corporation

The first part of the round table discussion was focused on the examples presented. An observation that principles have different meaning in the context of different examples and that understanding the context of different scenarios is key. Souad Ouali, the Trusted AI Principles Working Group Chair, responded that the RREPEATS principles were very carefully and deliberately worded to be applicable in a general way in international and varied settings. The round table panelists agreed that Trusted AI principles must be incorporated into the DNA of what we do throughout the AI life cycle and it is important to ask questions relating to principles at every step. An important observation from listening to the two practical example presentations is how tactical low-level tools map to high-level Trusted AI principles.

The round table ended with a discussion on corporate ethics boards and whether they should be geographically diverse or distributed, incorporate domain experts, and extend into the operational aspect of a business. Another consideration is whether an ethics board should include representatives from other institutions.

Join Us for Our Next Session

Please join us for The Trusted AI Principles – Tools and Techniques webinar on September 29, 2021. Register here!

Stay connected with the Trusted AI Committee by joining the mailing list here and attend one of our upcoming meetings! Learn more here.


LF AI & Data Resources

Data Intelligence – AI-based Automation

By Blog

Guest Author: Dr. Jagreet Kaur, Chief AI Officer, Xenonstack

What is Data Intelligence?

The world is leading towards data-driven intelligence. To stand in the world of evolving technology and the competition phase, organizations must make data and AI-based decisions. It becomes difficult for the organizations that are not working on those aspects and data to know the facts and insights while making decisions.

Data Intelligence enables the process of multisource data and generates meaningful insights that would help to make valuable decisions. It allows combining unstructured data and text analytics results with structured data to use for predictive analytics. It can give a real-time statistical analysis of structured or unstructured data to understand data patterns and dependencies.

Why Do You Need It?

Data intelligence is required to process and understand the data. Data intelligence is rapidly becoming one of the most important elements of big data. Data intelligence has progressed from the infantile stage to a point where it can handle vast amounts of data with intelligence. It isn’t going to fold its wings either; the immediate positive results have attracted many organizations’ attention. Various entrepreneurs have expressed interest in using and developing data intelligence to make intelligent decisions in driving their business. There are multiple cases where we may need it. Some instances have discussed that helps to know why we require data intelligence:

  • Artificial Intelligence: Using a machine learning algorithm helps to find the predictive analysis and to recognize correlation. It helps to find domain-specific custom entities and word usage.
  • Intuitive Visualization: It allows us to understand data effectively in less time using informative intuitive charts and graphs. Visualization helps to understand complex data within seconds rather than reading and understanding an excel or any other data file. Visualization also generates insights and clear data patterns that are difficult to find in tables or datasets. It enables to easily filter and drill down the reports according to the requirements.
  • Insight generation: Based on the collected data, it allows generating or taking insight from the visualization that helps to understand the business progress and customer needs.
  • Data-driven decision-making: To make better and data-driven decisions so that those correct decisions can be taken to gain customer satisfaction and revenue.

The Base Foundation For Data Intelligence

Data intelligence is an optimized way that provides an unconventional 360 view of the business environment. It helps to understand the customer requirements better and also monitors the organization’s performance. Based on the data or insights, make decisions according to customer preferences and improve its revenue and benefits. Data intelligence is based on several sets of techniques accordingly to enrich business decisions and processes. These are:

  • Descriptive (“What happened”): It is used to review and examine the historical and real-time data to understand business performance and customer behavior. It detects a particular occurrence of a situation.
  • Diagnostic (“How it happened”): To know the reason for the occurrence of a particular instance or situation.
  • Predictive (“What could happen”): It uses historical and, based on that, predicts future occurrence using some ML algorithm.
  • Prescriptive (“What Should We Do”): To develop and analyze the alternative knowledge that can be applied in the course of action. It helps us to understand what to do in the future.
  • Decisive: Decisive analytics helps to measure data suitability. It chooses the recommended action to implement it in the environment and real-time process when there are multiple possibilities.

How Can I Use Data Intelligence?

Data intelligence performs the following steps to identify relations and mentions of unstructured or structured data.

  • Data Ingestion: It collects structured and unstructured data from different sources such as documents, emails, databases, websites, and data repositories. Data can be inserted into the application or platform manually or scheduled at fixed intervals of time. Data can be processed and used by that application to perform tasks.
  • Data Processing: Now, data collected from sources can be processed and used to generate insights. It makes it possible to find a relation between data. Several tools provide an easy-to-use interface for creating custom models to train and test the model to find entities and data relations. It allows using models for future predictive analytics.
  • Reporting and Visualizations: Reporting and Visualization is the final step that analyzes the data using charts and graphs. Visualization makes it easy to understand large and complex data effectively.

The Benefits Of Data Intelligence

Data intelligence gives wings to the technology by providing intelligence in their daily tasks and decisions. Let’s discuss the benefits of data intelligence and why organizations should embrace them:

  • Changing demands: Data intelligence makes the organization adopt the dynamic changes of the industries. The business nowadays is continuously evolving. To stand in the competition and reduce the chances of failure, organizations must accept and update the newly emerging trends. For example, the trend of the adoption of selfie cameras in smartphones was increasing. Mobile businesses that do not capitalize on the trend are doomed to fail. Data intelligence helps organizations to understand customer behavior and change. Firms are informed about repeated changes and the pattern of occurrence by smart adaptive dynamics. It allows the company to make informed decisions based on the analysis.
  • Strong Foundation of Data: Data intelligence makes big data more strong and strengthened by restructuring the process of data arrangements. It allows to gather insights from big data and then render optimized engagement capability.
  • Useful Data: No doubt the world is generating a large volume of data every day that can change the world and improve the services according to customer demands and preference. But most of the data is not in the form of use. It is not possible to directly use that data. It is required to transform it into a useful form to use data. Data intelligence is also in charge of converting raw data into cumulative information. Data intelligence cleans and transforms data into smart capsules of ready-to-use data that can be used in the company to assess results. Data intelligence makes it possible not to worry about defining particular cases to the computers.
  • Augmented Analytics: Advanced statistical approaches are used in data intelligence to advance visualized predictive and prescriptive analytics. Instead of building a complete application every time, it automates the data processing that can be completed just by doing some simple steps. If required, further changes may be recommended based on the results. There is no way for business plans to fail with such extensive planning for a real-life scenario. Advanced simulations enable businesses to predict potential outcomes and make changes to prescriptions as needed.
  • Accelerate innovation: Data intelligence makes it possible to accelerate innovations by making smart use of data. It allows using data insights to drive business innovations and use them to develop their services by considering customer preferences and requirements.

What is the difference between data information and intelligence? 

  • Data: It is the raw form of data recorded truth at a point in time. It might be a conversation, a purchase, or an interaction with your company’s website. Data is the compilation of results from those incidents that are then quantifiably recorded so that companies can review them easily. 
  • Information: It is a collection of data or a way of bringing data together. When data is picked from an event and put into narrative forms, it helps to answer the following questions: 
    • What is the churn rate of employees?
    • How long is the sales cycle of an organization?
      The information helps to answer these questions that move the business.
  • Intelligence: Intelligence is a group of information to derive intelligence or decisions in their application or tasks. For instance, suppose you are selling more in southern regions, then the smart and intelligent answer will be why that might be. To get an answer, it will look at numbers such as the number of events, amount spent on advertisements, marketing campaigns southern region clients receive. After that, it can be compared with the other region (North region). Through this analysis, we get to know that there are more client interactions in the southern region, so to increase sales in the North region, it is necessary to do the same.

Data Intelligence In The Real World

  • Healthcare: Rapid digitalization of healthcare systems are adopting technologies to create a connected healthcare environment. Hospitals need to synchronize with the technology to become smart, advance, and more accurate. Hospitals use various types of sensors, apps, digital equipment that are generating a large volume of data regularly. This data can be used to automate several processes such as administrative, treatment, and clinical processes. Data intelligence capabilities allow ML, AI, and Deep learning to make the healthcare processes more accurate, fast and help the practitioner handle the increasing number of cases and processes. These advanced technologies help to extract real-time intelligence and make decisions regarding the diagnosis process, prescribing medicines, hospital management, laboratory, patient care, etc., and leads to high operational efficiency and care delivery.
  • Supply chain management: Supply chain software generates and collects a vast amount of data. But they are not aware of how they can best use it to make their operations more effective. Data intelligence in the supply chain management network predicts business risk, minimizes loss, and makes automated self-learning supply chains. As a result, it drives real-time coordination and innovations.
  • Human Resource: Organizations are using HR software to manage internal HR functions such as payroll, employee benefits, recruitment, training, talent management, attendance management, employee engagement, etc., to enhance their features and capabilities. They always have to do many tasks to understand employees better, attract top talent, and initiate programs to retain them and analyze their performance. They have a lot of data generated from their HRMS(Human Resource Management System) software. Data Intelligence can help them analyze and understand the data, gather insights, and make a precise decision that can make their organization drive healthier and faster.
  • E-commerce: One of the success secrets of an e-commerce website is using customer reviews to know their experience, preference and then use them to make profitable decisions. Using ML and NLP techniques to interact with their customers and get data from them and use it to drive performance, improve Customer Engagement, Service Quality, Support Quality, and ultimately Sales. Data Intelligence makes it possible for them to accomplish these tasks, recommend products, understand customer preferences, solve their queries, improve quality and services, etc. Harnessing this information can give you a treasure trove of insights that can power your products and processes, improve customer experience, marketing, manage store operation, etc.


Akira AI is a data intelligence platform that provides intelligence using analysis and learning by processing data from various sources.

LF AI & Data Resources

Flyte 0.13.0 Release Now Available!

By Blog

Flyte, an LF AI & Data Foundation Incubation Project, has released version 0.13.0! Flyte is a Kubernetes native workflow automation platform for complex, mission-critical data and ML processes at scale. It allows users to describe their ML/Data pipelines using Python, Java, Scala, or in the future, other languages and manages the data flow, parallelization, scaling, and orchestration of these pipelines. 

In version 0.13.0, Flyte adds a variety of improvements. Highlights include:


  • Support for complete Oauth2 spec including SingleSignOn and configuration examples for popular IDP’s now available in Flyte. Please see the updated information and description of the feature, and the setup information.
  • Backend improvements to support dynamic workflow visualization and simpler debugging for Kubernetes and external errors (UI updates in  future releases).
  • flytectl: Cross-platform portable CLI for Flyte
  • Documentation site overhaul and redesign
  • Improved end to end platform performance


  • Beta: Updated API to interact with past executions, launch new executions. This makes it possible to have simplified programmatic access for all Flyte features and perform regular data science tasks like retrieving previous results, compare executions, etc. 
  • Beta: Support for prebuilt container plugins with faster user interactivity
  • Plugin: Interact with any SQL database using SqlAlchemy
  • Plugin: Use Versioned datasets using DoltDB
  • Access secrets using standardized interaction pattern

To learn more about the Flyte 0.13.0 release, check out the full release notes. Want to get involved with Flyte? Be sure to join the Flyte-Announce and Flyte-Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the Flyte team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

Flyte Key Links

LF AI & Data Resources

RosaeNLG Joins LF AI & Data as New Sandbox Project

By Blog

LF AI & Data Foundation—the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), deep learning (DL), and Data open source projects, is announcing RosaeNLG joining the Foundation as its first Sandbox Project. 

The Sandbox stage was recently added by the LF AI & Data Technical Advisory Council (TAC) to accommodate early stage projects that meet one or more  of the following requirements:

  • Any project that intends to join LF AI & Data Incubation in the future and wishes to lay the foundations for that.
  • New projects that are designed to extend one or more LF AI & Data projects with functionality or interoperability libraries. 
  • Independent projects that fit the LF AI & Data mission and provide the potential for a novel approach to existing functional areas (or are an attempt to meet an unfulfilled need).

RosaeNLG is a great fit for this stage and was voted by the TAC into incubation at the Sandbox stage. It is an open source Natural Language Generation (NLG) project that aims to offer the same NLG features as product NLG solutions and to be developer and IT friendly for ease of integration and configuration. RosaeNLG was released and open sourced by Ludan Stoecklé, CTO of Data & AI Lab at BNP Paribas CIB and Expert Professor at aivancity school. 

Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “RosaeNLG is a great foundational project that aims to broaden the accessibility and understandability of AI. We’re excited to welcome RosaeNLG as our first Sandbox stage project and look forward to supporting its journey for increased adoption, growth, and collaboration with other projects.”  

Template-based Natural Language Generation (NLG) automates the production of relatively repetitive texts based on structured input data and textual templates, run by an NLG engine. Production usage is widespread in large corporations, especially in the financial industry.

Typical use cases are:

  • Describing a product based on its features for SEO purposes
  • Produce structured reports such as risk reports or fund performance in the financial industry
  • Generate well formed chatbot answers

RosaeNLG templates are developed on VSCode with a friendly syntax and are easy to integrate. It currently supports languages such as English, French, German, Italian, and Spanish, with linguistic resources. It also provides NLG on both the server-side (using node.js REST API) and the browser-side.

Ludan Stoecklé, the founder of RosaeNLG, said: “Non-expert users don’t understand long tables of figures and dashboards; they prefer simple textual explanations. NLG is key in the democratization and understandability of data in general and to trusted AI in particular. Template-based NLG is the only way to achieve complex data-to-text projects without any error or hallucination in the texts, which is mandatory for trust. The support of the LF AI & Data Foundation will foster adoption and community growth, as well as diversity in NLG domain, with the goal to support more than 50 commonly spoken languages.”

LF AI & Data supports projects via a wide range of services, and the first step is joining the Foundation in incubation. Learn more about RosaeNLG on their GitHub and be sure to join the RosaeNLG-Announce and RosaeNLG-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to RosaeNLG! We look forward to the project’s continued growth and success as part of the LF AI & Data Foundation. To learn about how to host an open source project with us, visit the LF AI & Data website.

RosaeNLG Key Links

LF AI & Data Resources

ONNX 1.9 Release Now Available!

By Blog

ONNX, an LF AI & Data Foundation Graduated Project, has released version 1.9! ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. 

In version 1.9, ONNX adds a variety of improvements. Highlights include

  • Selective schema loading of specific operator set versions to reduce memory usage in runtimes
  • New and updated operators to support more data types and models including advanced object detection models like MobileNetV3, YOLOv5
  • Improved tools for splitting large, multi-GB models into separate files
  • More details and sample tests added to operator documentation
  • Version converter enhanced to make it easier to upgrade models to newer operator sets

To learn more about the ONNX 1.9 release, check out the full release notes. Want to get involved with ONNX? Be sure to join the ONNX Announce mailing list to join the community and stay connected on the latest updates. 

Congratulations to the ONNX team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

ONNX Key Links

LF AI & Data Resources

New LF AI & Data Member Welcome – Q1 2021

By Blog

We are excited to welcome six new members to the LF AI & Data Foundation. VMware has joined as a General Member, and Galgotias University, High School Technology Services, OpenUK, Ken Kennedy Institute, and the University of Washington – Tacoma have joined as Associate Members. 

The LF AI & Data Foundation will build and support an open community and a growing ecosystem of open source AI, data and analytics projects, by accelerating development and innovation, enabling collaboration and the creation of new opportunities for all of the members of the community.

Learn more about the new organizations in their own words below:

General Members

The LF AI & Data General membership is targeted for organizations that want to put their organization in full view in support of LF AI & Data and our mission. Organizations that join at the General level are committed to using open source technology, helping LF AI & Data grow, voicing the opinions of their customers, and giving back to the community.

VMware streamlines the journey for organizations to become digital businesses that deliver better experiences to their customers and empower employees to do their best work. Our software spans App Modernization, Cloud, Networking & Security, and Digital Workspace.

Associate Members

The LF AI & Data Associate membership is reserved for pre-approved non-profits, open source projects, and government entities. 

Galgotias University is devoted to excellence in teaching, research and innovation, and to develop leaders who’ll make a difference to the world. The University, which is based in Greater Noida, has an enrollment of over 15,000 students across more than 100 Undergraduate and Postgraduate programs.

High School Technology Services strives to provide the highest quality information technology services to high schools, teenagers, and adults. From creating websites to building computer labs, from offering counseling sessions to designing preparation programs, we aim to provide an assortment of services to support the dreams of these teenagers, their parents, and the schools.


OpenUK advocates for Open Technology being open source software, open source hardware, and open data, “Open” in and for business communities across the UK. As an industry advocacy organization, OpenUK gives its participants greater influence than they could ever achieve alone. Collaboration is central to everything we do and we use it to bring together business, public sector, and community in the UK to collaborate locally and globally.


The Ken Kennedy Institute is the virtual home of over two hundred faculty members and senior researchers at Rice University spanning computer science, mathematics, statistics, engineering, natural sciences, humanities, social sciences, business, architecture, and music.

The Institute brings together the Rice community to collaboratively solve critical global challenges by fostering innovations in computing and harnessing the transformative power of data. We enable new conversations, drive interdisciplinary research in AI and data science, develop new technology to serve society, advance the current and future workforce, promote an ecosystem of innovation and entrepreneurship, and develop academic, industry, and community partnerships in the computational sciences.


University of Washington – Tacoma is an urban-serving university providing access to students in a way that transforms families and communities. We impact and inform economic development through community-engaged students and faculty. We conduct research that is of direct use to our community and region. And, most importantly, we seek to be connected to our community’s needs and aspirations.

Welcome New Members!

We look forward to partnering with these new LF AI & Data Foundation members to help support open source innovation and projects within the artificial intelligence (AI), machine learning (ML), deep learning (DL), and data space. Welcome to our new members!

Interested in joining the LF AI & Data community as a member? Learn more here and email for more information and/or questions. 

LF AI & Data Resources

Egeria 2.8 Release Now Available!

By Blog

Egeria, an LF AI & Data Foundation Graduate Project, has released version 2.8! Egeria is an open source project dedicated to making metadata open and automatically exchanged between tools and platforms, no matter which vendor they come from. 

In version 2.8, Egeria adds a variety of improvements. Highlights include:

  • New support for event and property filtering for the open metadata server security connector
    • The repository services support 3 filtering points for managing events for the OMRS Cohort Topic, however, the filtering points are set up in the configuration document of the server. This configuration provides no control to allow filtering of events for specific instances. Version 2.8 extends the metadata server security connector so it can be called at these same filter points.
    • The security server connector will have two new interfaces that it can implement: one for the cohort events and one for saving events to the local repository.
      • The event interface will have two methods, one for sending and one for receiving. The parameters will include the cohort name and the event contents. It can return the event unchanged, return a modified event (e.g. with sensitive content removed) or return null to say that the event is filtered out.
      • The event saving interface will receive the instance header and can return a boolean to indicate if the local repository should store it. If true is returned, the refresh event sequence is initiated. The repository connector then has the ultimate choice when the refreshed instance is returned from the home repository as to whether to store it or not.
  • Changes to metadata types
    • Updates to the location types in model 0025:
      • Add the mapProjection property to the FixedLocation classification
      • Change the address property to networkAddress in the CyberLocation classification
      • Deprecated HostLocation in favor of the AssetLocation relationship
    • Deprecate the RuntimeForProcess relationship since it is superfluous – use ServerAssetUse since Application is a SoftwareServerCapability.
    • Replace the deployedImplementationType property with the businessCapabilityType in the BusinessCapability since it is a more descriptive name.
  • New performance workbench for the CTS (technical preview)
    • The performance workbench intends to test the response time of all repository (metadata collection) methods for the technology under test. The volume of the test can be easily configured to also test scalability.
  • New interface for retrieving the complete history of a single metadata instance
    • Two new (optional) methods have been introduced to the metadata collection interface:
      • getEntityDetailHistory
      • getRelationshipHistory
    • Both methods take the GUID of the instance for which to retrieve history, an optional range of times between which to retrieve the historical versions (or if both are null to retrieve all historical versions), and a set of paging parameters.
    • If not implemented by a repository, these will simply throw FunctionNotSupported exceptions by default to indicate that they are not implemented.
  • Splitting of CTS results into multiple smaller files
    • Up to this release, the detailed results of a CTS run could only be retrieved by pulling a huge (100’s of MB) file across the REST interface for the CTS. Aside from not typically working with most REST clients (like Postman), this had the additional impact of a sudden huge hit on the JVM heap to serialize such a large JSON structure (immediately grabbing ~1GB of the heap). While this old interface still exists for backwards compatibility, the new default interface provided in this release allows users to pull down just an overall summary of the results separately from the full detailed results, and the detailed results are now broken down into separate files by profile and test case: each of which can therefore be retrieved individually.
  • Bug fixes and other updates
    • Additional Bug Fixes
    • Dependency Updates

To learn more about the Egeria 2.8 release, check out the full release notes. Want to get involved with Egeria? Be sure to join the Egeria-Announce and Egeria-Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the Egeria team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

Egeria Key Links

LF AI & Data Resources