A big thank you to Orange for hosting a great virtual meetup! LF AI & Data Day EU Virtual was held on June 10, 2021 with 76attendees joining live.
This event featured keynote speakers from leading AI industries from IBM, Orange, AIvancity School, and Banque de France with a focus on ML Breakthroughs, open source strategies for scaling machine learning, and Trusted AI. Various AI topics were covered, including technical presentations on MLOps, AI learning, Trusted AI, and new LF AI & Data projects such as Rosae NLG, ONNX, Machine Learning Exchange, and Datashim. ITU’s AI Activities were also presented during the closing session.
Missed the event? Check out all of the presentations and recording here.
This meetup took on a virtual format but we look forward to connecting again at another event in person soon. LF AI & Data Day is a regional, one-day event hosted and organized by local members with support from LF AI & Data, its members, and projects. If you are interested in hosting an LF AI & Data Day please email firstname.lastname@example.org to discuss.
Event host, Orange, is a leading telecommunications company with headquarters in France. They are the largest telecoms operator in France, with the bulk of their operations in Europe, Africa and the Middle East. As an LF AI & Data General Member, Orange is involved with the LF AI & Data Governing Board, Outreach Committee, Trusted AI Committee, and is an active contributor to the LF AI & Data Acumos project.
LF AI & Data Foundation—the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI)and Data open source projects, today is announcing DELTA as its latest Incubation Project.
DELTA is a deep learning based end-to-end natural language and speech processing platform. It aims to provide easy and fast experiences for using, deploying, and developing natural language processing (NLP) and speech models for both academia and industry use cases. DELTA is mainly implemented using TensorFlow and Python 3. It was released and open sourced by DiDi Global.
Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “We are excited to welcome DELTA to LF AI & Data and help it thrive in a neutral, vendor-free environment under an open governance model. We look forward to help the project grow its community of users and contributors, enable collaboration and integration opportunities with other hosted projects to drive innovation open source AI technologies.”
DELTA has been used for developing several state-of-the-art algorithms for publications and delivering real production to serve millions of users. It helps you to train, develop, and deploy NLP and/or speech models, featuring:
One command to train NLP and speech models, including:
NLP: text classification, named entity recognition, question and answering, text summarization, etc
Use configuration files to easily tune parameters and network structures
What you see in training is what you get in serving: all data processing and features extraction are integrated into a model graph
Uniform I/O interfaces and no changes for new models
Easily build state-of-the-art models using modularized components
All modules are reliable and fully-tested
Yunbo Wang, co-creator of DELTA, said: “ NLP and voice technology have been widely applied throughout DiDi’s business. For instance, Didi has built an intelligent customer service system based on AI to assist the efficiency of human customer service and reduce repetitive effort. Based on voice recognition and natural language understanding, DiDi has built a voice assistant function for drivers and applied it to the contact-free ride-hailingservices in Japan and Australia. In the future, DiDi will continue to actively promote the opening of related capabilities. Through one-stop natural language processing tools and platforms, DiDi will help its industrial partners realize better AI application landing”.
LF AI & Data supports projects via a wide range of services, and the first step is joining as an Incubation Project. LF AI & Data will support the neutral open governance for DELTA to help foster the growth of the project. Check out the Documentation to start working with DELTA today. Learn more about DELTA on their GitHub and be sure to join the DELTA-Announce and DELTA-Technical-Discuss mail lists to join the community and stay connected on the latest updates.
A warm welcome to DELTA! We look forward to the project’s continued growth and success as part of the LF AI & Data Foundation. To learn about how to host an open source project with us, visit the LF AI & Data website.
We’re excited to announce the v0.4 release of Ludwig — the open source, low-code declarative deep learning framework created and open sourced by Uber and hosted by the LF AI & Data Foundation. Ludwig enables you to apply state-of-the-art tabular, NLP, and computer vision models to your existing data and put them into production with just a few short commands.
The focus of this release is to bring MLOps best practices through declarative deep learning with enhanced scalability for data processing, training, and hyperparameter search. The new features of this release include:
Integration with Ray for large-scale distributed training that combines Dask and Horovod
A new distributed hyperparameter search integration with Ray Tune
The addition of TabNet as a combiner for state-of-the-art deep learning on tabular data
MLflow integration for unified experiment tracking and model serving
Preconfigured datasets for a wide variety of different tasks, leveraging Kaggle
Ludwig combines of all these elements into a single toolkit that guides you through machine learning end-to-end:
Experimentation with different model architectures using Ray Tune
Data cleaning and preprocessing up to large out-of-memory datasets with Dask and Ray
Distributed training on multi-node clusters with Horovod and Ray
Deployment and serving the best model in production with MLflow
Ludwig abstracts away the complexity of combining all these disparate systems together through its declarative approach to structuring machine learning pipelines. Instead of writing code for your model, training loop, preprocessing, postprocessing, evaluation, and hyperparameter optimization, you only need to declare the schema of your data as a simple YAML configuration:
Starting from a simple config like the one above, any and all aspects of the model architecture, training loop, hyperparameter search, and backend infrastructure can be modified as additional fields in the declarative configuration to customize the pipeline to meet your requirements:
Why Declarative Machine Learning Systems?
Ludwig’s declarative approach to machine learning presents the simplicity of conventional AutoML solutions with the flexibility of full-featured frameworks like TensorFlow and PyTorch. This is achieved by creating an extensible, declarative configuration with optional parameters for every aspect of the pipeline. Ludwig’s declarative programming model allows for key features such as:
Multi-modal, multi-task learning in zero lines of code. Mix and match tabular data, text, imagery, and even audio into complex model configurations without writing code.
Integration with any structured data source. If it can be read into a SQL table or Pandas DataFrame, Ludwig can train a model on it.
Easily explore different model configurations and parameters with hyperopt. Automatically track all trials and metrics with tools like Comet ML, Weights & Biases, and MLflow.
Automatically scale training to multi-GPU, multi-node clusters. Go from training on your local machine to the cloud without code or config changes.
Fully customize any part of the training process. Every part of the model and training process is fully configurable in YAML, and easy to extend through custom TensorFlow modules with a simple interface.
Ludwig distributed training and data processing with Ray
Ludwig on Ray is a new backend introduced in v0.4 that illustrates the power of declarative machine learning. Starting from any existing Ludwig configuration like the one above, users can scale their training process from running on their local laptop, to running in the cloud on a GPU instance, to scaling across hundreds of machines in parallel, all without changing a single line of code.
By integrating with Ray, Ludwig is able to provide a unified way for doing distributed training:
Ray enables you to provision a cluster of machines in a single command through its cluster launcher.
Horovod on Ray enables you to do distributed training without needing to configure MPI in your environment.
Dask on Ray enables you to process large datasets that don’t fit in memory on a single machine.
Ray Tune enables you to easily run distributed hyperparameter search across many machines in parallel.
All of this comes for free without changing a single line of code in Ludwig. When Ludwig detects that you’re running within a Ray cluster, the Ray backend will be enabled automatically.
After launching a Ray cluster by running ray up on the command line, you need only ray submit your existing Ludwig training command to scale out across all the nodes in your Ray cluster.
Behind the scenes, Ludwig will do the work of determining what resources your Ray cluster has (number of nodes, GPUs, etc.) and spreading out the work to speed up the training process.
Ludwig on Ray will use Dask as a distributed DataFrame engine, allowing it to process large datasets that do not fit within the memory of a single machine. After processing the data into Parquet or TFRecord format, Ludwig on Ray will automatically spin up Horovod workers to distribute the TensorFlow training process across multiple GPUs.
To get you started, we provide Docker images for both CPU and GPU environments. These images come pre-installed with Ray, CUDA, Dask, Horovod, TensorFlow, and everything else you need to train any model with Ludwig on Ray. Just add one of these Docker images to your Ray cluster config and you can start doing large scale distributed deep learning in the cloud within minutes:
As with other aspects of Ludwig, the Ray backend can be configured through the Ludwig config YAML. For example, when running on large datasets in the cloud, it can be useful to customize the cache directory where Ludwig writes the preprocessed data to use a specific bucket in a cloud object storage system like Amazon S3:
In Ludwig v0.4, you can use cloud object storage like Amazon S3, Google Cloud Storage, Azure Data Lake Storage, and MinIO for datasets, processed data caches, config files, and training output. Just specify your filenames using the appropriate protocol and environment variables, and Ludwig will take care of the rest.
Check the Ludwig user guide for a complete description of available configuration options.
New distributed hyperparameter search with Ray Tune
Another new feature of the 0.4 release is the ability to do distributed hyperparameter search. With this release, Ludwig users will be able to execute hyperparameter search using cutting edge algorithms, including Population-Based Training, Bayesian Optimization, and HyperBand, among others.
We first introduced hyperparameter search capabilities for Ludwig in v0.3, but the integration with Ray Tune — a distributed hyperparameter tuning library native to Ray — makes it possible to distribute the search process across an entire cluster of machines, and use any search algorithms provided by Ray Tune within Ludwig out-of-the-box. Through Ludwig’s declarative configuration, you can start using Ray Tune to optimize over any of Ludwig’s configurable parameters with just a few additional lines in your config file:
To run this on Ray across all the nodes in your cluster, you need only take the existing ludwig hyperopt command and ray submit it to the cluster:
Within the hyperopt.sampler section of the Ludwig config, you’re free to customize the hyperparameter search process with the full set of search algorithms and configuration settings provided by Ray Tune:
State-of-the-Art Tabular Models with TabNet
The first version of Ludwig released in 2019 supported tabular datasets using a concat combiner that implements the Wide and Deep learning architecture. When users specify numerical, category, and binary feature types, the concat combiner will concatenate the features together and build a stack of fully connected layers.
In this release we are extending Ludwig’s support for tabular data by adding a new TabNet combiner. TabNet is a state-of-the-art deep learning model architecture for tabular data that uses sparsity and multiple steps of feature transformations and attention to achieve high performance. The Ludwig implementation allows users to also use feature types other than the classic tabular ones as inputs.
Training a TabNet model is as easy as specifying a tabnet combiner and providing its hyperparameters in the Ludwig configuration.
We compared the performance achieved by the Ludwig TabNet implementation with the performance reported in the original paper, where the authors trained for longer and performed hyperparameter optimization, and confirmed it can achieve very comparable results in minimal time even when trained locally, as shown in the table below.
TabNet Paper Accuracy
Ludwig TabNet Accuracy
Forest Tree Cover
In addition to TabNet, we also added a new Transformer based combiner and improved upon the existing concat combiner by supporting optional skip connections. These additions make Ludwig a powerful and flexible option for training deep learning models on tabular data.
Experiment Tracking and Model Serving with MLflow
MLflow is an open source experiment tracking and model registry system.
Ludwig v0.4 introduces first-class support for tracking Ludwig train, experiment, and hyperopt runs in MLflow with just a single extra command-line argument: –mlflow.
The experiment_name you provide to Ludwig will map directly to an experiment in MLflow so you can organize multiple training or hyperopt runs together.
This functionality is also exposed through the Python API through a single callback:
In addition to tracking experiment results, MLflow can also be used to store and serve models in production. Ludwig v0.4 makes it easy to take an existing Ludwig model (either saved as a directory or in an MLflow experiment) and register it with the MLflow model registry:
The Ludwig model will be converted automatically to MLflow’s model.pyfunc format, allowing it to be executed in a framework-agnostic way through a REST endpoint, Spark UDF, Python API with Pandas, etc.
Preconfigured datasets from Kaggle
Since its initial release, Ludwig has required datasets to be provided in tabular form, with a header containing names that can be referenced from the configuration file. In order to make it easy to get started with applying Ludwig to popular datasets and tasks, we’ve added a new datasets module in v0.4 that allows you to download datasets, process them into a tabular format ready for use with Ludwig, and load them into a DataFrame for training in a single line of code.
The Ludwig datasets module integrates with the Kaggle API to provide instant access to popular datasets used in Kaggle competitions. In v0.4, we provide access to popular competition datasets like Titanic, Rossmann Store Sales, Ames Housing and more. Here is an example of how to load the Titanic dataset:
Adding a new dataset is straightforward and just requires extending the Dataset abstract class and implementing minimal data manipulation code. This has allowed us to quickly expand the set of supported datasets to include SST, MNIST, Amazon Review, Yahoo Answers and many more. For a full list of the available dataset please check the User Guide, We encourage you to contribute your own favorite datasets!
Our goal is to make machine learning easier and more accessible to a broader audience. We’re excited to continue to pursue this goal with features for Ludwig in the pipeline, including:
End-to-end AutoML with Neural Architecture Search – Offload part or all of the work of picking the optimal search strategy, tuning parameters, and choosing encoders/combiners/decoders for your given dataset and resources during model training.
Combined hyperopt & distributed training – Jointly run hyperopt and distributed training to find the best model within a provided time constraint.
Pure TensorFlow low-latency serving – Leverage a flexible and high-performance serving system designed for production machine learning environments using TensorFlow Serving.
PyTorch backend – Write custom Ludwig modules using all your favorite frameworks and take advantage of the rich ecosystem each provides.
We hope that these new capabilities will make it easier for our community to continue to build state-of-the-art models. If you are excited in this direction as we are, join our community and get involved! We are building this open source project together, we’ll keep on pushing for a release of Ludwig v0.5 and we welcome contributions from anyone who is excited to see this happen!
We also recognize that for many organizations, success with machine learning means solving many challenges end-to-end; from connecting & accessing data, to training and deploying model pipelines, and then making those models easily available to the rest of the organization.
That’s why we’re also excited to announce that we are building a new solution called Predibase, a cohesive enterprise platform built on top of Ludwig, Horovod, and Ray to help realize the vision of making machine learning easier and more accessible. We’ll be sharing more details soon, and if you’re excited to get in touch in the meantime please feel free to reach out to us at email@example.com (we are hiring!).
We really hope that you find the new features in Ludwig 0.4 exciting, and want to thank our amazing community for the contributions and requests. Please drop us a comment or email with any feedback, and happy training!
A lot of work went into Ludwig v0.4, and we want to thank everyone who contributed and helped, and in particular the main contributors and community members to this release: our co-maintainer Jim Thompson, Saikat Kanjilal, Avanika Narayan, Nimit Sohoni, Kanishk Kalra, Michael Zhu, Elias Castro-Hernandez, Debbie Yuen, Victor Dai. Special thanks to the immense support from the Stanford’s Hazy research group led by Prof. Chris Ré, to Richard Liaw, Hao Zhang and Micheal Chau from the Ray team, and the LF AI & Data staff.
On behalf of the LF AI & Data’s Trusted AI Committee Principles Working Group, I am pleased to announce LF AI & Data’s Principles for Trusted AI. LF AI & Data is an umbrella foundation of the Linux Foundation that supports open source innovation in artificial intelligence, machine learning, deep learning, and data.
With these principles, the LF AI & Data Foundation is not only joining other open source and AI communities in adopting a set of ethical, responsible, and trust-based principles, it is also inviting the larger Linux Foundation community—19K+ companies and 235K+ developers to lead by trust and responsibility. According to its website, the “Linux Foundation enables global innovation by growing open technology ecosystems that transform industries: 100% of supercomputers use Linux, ~95% public cloud providers use Kubernetes, 70% global mobile subscribers run on networks built using ONAP, 50% of the Fortune Top enterprise blockchain deployments use Hyperledger.”
With such immense impact and scale, the responsibility to approach innovation with trust is immense. LF AI & Data’s AI principles are guided by a vision to expand access and invite innovation at all levels of engagement. The language of the principles has been kept simple and easy to understand, yet flexible, to help ensure flexibility and wider adoption. Not an easy task.
These principles were derived after over a year of deliberation which included parsing through the various industry, non-profit, and partner company’s AI principles, guidelines, contributions, and principles, while always keeping the community and social impact front and center. In addition to member companies’ and non-profit groups’ input, guidelines from OECD, EU, SoA, ACM, IEEE, DoD were also referenced. The key criteria balanced competing interests across the industry and companies with the need for open and innovative technology built with trust and accountability.
LF & AI Data Foundation AI Principles: (R)REPEATS
The (R)REPEATS acronym captures the principles of Reproducibility, Robustness, Equitability, Privacy, Explainability, Accountability, Transparency, and Security. The image below illustrates that a cohesive approach to implementation is needed. The order in which the principles are listed is not meant to denote hierarchy. Neither is this a list to pick and choose what is convenient. Rather, as so many discussions and efforts to implement AI principles in the industry and committee members in their ecosystem have illustrated, all these principles are interconnected and interdependent, and important.
Artificial Intelligence (AI) in the following definitions refer to and imply any flavor and use of Artificial Intelligence or a derivative of Artificial Intelligence–including but not limited to software or hardware, simple or complex systems that include machine learning, deep learning, data integrated with other adjacent technologies like computer visionwhether created by people or another AI.
Reproducibility is the ability of an independent team to replicate in an equivalent AI environment, domain or area, the same experiences or results using the same AI methods, data, software, codes, algorithms, models, and documentation, to reach the same conclusions as the original research or activity. Adhering to this principle will ensure the reliability of the results or experiences produced by any AI.
Robustness refers to the stability, resilience, and performance of the systems and machines dealing with changing ecosystems. AI must function robustly throughout its life cycle and potential risks should be continually assessed and managed.
Equitability for AI and the people behind AI should take deliberate steps – in the AI life-cycle – to avoid intended or unintended bias and unfairness that would inadvertently cause harm.
Privacy requires AI systems to guarantee privacy and data protection throughout a system’s entire lifecycle. The lifecycle activities include the information initially collected from users, as well as information generated about users throughout their interaction with the system e.g., outputs that are AI-generated for specific users or how users responded to recommendations. Any AI must ensure that data collected or inferred about individuals will not be used to unlawfully or unfairly discriminate against them. Privacy and transparency are especially needed when dealing with digital records that allow inferences such as identity, preferences, and future behavior.
Explainability is the ability to describe how AI works, i.e., makes decisions. Explanations should be produced regarding both the procedures followed by the AI (i.e., its inputs, methods, models, and outputs) and the specific decisions that are made. These explanations should be accessible to people with varying degrees of expertise and capabilities including the public. For the explainability principle to take effect, the AI engineering discipline should be sufficiently advanced such that technical experts possess an appropriate understanding of the technology, development processes, and operational methods of its AI systems, including the ability to explain the sources and triggers for decisions through transparent, traceable processes and auditable methodologies, data sources, and design procedure and documentation.
Accountability requires AI and people behind the AI to explain, justify, and take responsibility for any decision and action made by the AI. Mechanisms, such as governance and tools, are necessary to achieve accountability.
Transparency entails the disclosure around AI systems to ensure that people understand AI-based outcomes, especially in high-risk AI domains. When relevant and not immediately obvious, users should be clearly informed when and how they are interacting with an AI and not a human being. For transparency, ensuring that clear information is provided about the AI’s capabilities and limitations, in particular the purpose for which the systems are intended, is necessary. Information about training and testing data sets where feasible, the conditions under which AI can be expected to function as intended and the expected level of accuracy in achieving the specified purpose, should also be supplied. And finally,
Security and safety of AI should be tested and assured across the entire life cycle within an explicit and well-defined domain of use. In addition, any AI should be designed to also safeguard the people who are impacted.
In addition to the definitions of the principles shared here, further descriptions and background can be accessed at LF AI & Data wiki.
This blog announcement serves as an open call for AI Projects to examine and adopt the Principles – at various stages in their life-cycle. We invite the LF AI & Data community to engage with the principles and examine how to apply them to their projects and share the results and challenges within the wider community of the Linux Foundation as well as LF AI & Data.
Open source communities have the tradition to take standards, code or ideas, put them into practice, share results, and evolve them. We also invite volunteers to help assess the relationship of the Principles with existing and emerging trusted AI toolkits and software to help identify any gaps and solutions. The LF AI & Data Trusted AI Committee is holding a webinar hosted by the Principles Working Group members to present the principles, solicit feedback and continue to explore other options to engage the larger community.
Guest Authors: Utpal Mangla: VP & Senior Partner; Global Leader: IBM’s Telco Media Entertainment Industry Center of Competency at IBM (firstname.lastname@example.org) AND Luca Marchi: Artificial Intelligence Lead – Center of Competence – Telco, Media andEntertainment at IBM (email@example.com)
All telco companies that outperform their peers in terms of revenue growth and profitability have one thing in common. They apply AI throughout their organization following a clear leadership: they have established a new path to value by integrating data into their strategy, operations, and culture.
Data is key but not enough
In the telecommunication world, data availability is exploding: Gartner forecasts that more than 20.4 billions devices will be connected in 2020 generating a constant flow of data that telco can leverage to better understand their customers and increase their business. AI-powered natural language processing enabled the analysis of unstructured data at large scale, supposedly up to 80% of all existing data. Nonetheless, data is not enough. In order to make data usable, Telcos need solve two problems: information architecture (IA) and data privacy. Thus carriers moved data at the core of their strategy and invested in creating new data sources to provide the right information at the right time for the right purpose. Data privacy is a lever for telcos to gain the customer trust and a key competitive advantage they need to achieve.
What can a telco with the right data strategy achieve? Increased customer experience and business expansion.
In fact, data-driven companies leverage AI to better identify unmet customer needs and delivery value at every customer touchpoint. Analytics systems powered by artificial intelligence use structured and unstructured data to identify behavior patterns and customer needs that would be otherwise missed. For examples, telcos can understand when a customer is likely to churn out and provide the best offer to make her stay or they can push a personalized data package based on her data usage. Artificial intelligent scale rich customer interaction across channels. Virtual agents interact via text or voice with customer on an IVR, a mobile app or Whatsapp, providing the same level of customer experience. Cognitive care is usually the starting point of the telcos journey in AI and many market leaders like Vodafone and CenturyLink have achieved enormous success in answering customer queries, personalizing the sales journey, improving customer completionrates and satisfaction and increasing brand score and NPS. In terms of business expansion, the application of artificial intelligence and big data supports the development of new business models and the entry into new businesses and markets. In recent years, a common path followed by telcos internationally has been the extension into the fintech business. Thanks to the large amount of customer data they possess, telcos have a deep knowledge of their clients. When this knowledge is paired with the trust customers have for telcos, carriers are in the right spot to provide personalized financial services.
A great example is Orange Bank, the digital-native bank launched by French telco Orange: it provides unique offerings plus innovative customer relationship model, and it implements a new “phygital” and omnichannel model, integrated with banking, CRM, concierge and advisory services.
Not just technology, but technology and humans together.
Enterprise success is fostered by decision making based on data. To get there, organizations need to collect all data required to make decision and executives and employees need to have a data-oriented mindset to enable quality decision making.
Data-driven telco or cognitive telcos follow a 4 step process that entails (1) transformation of workforce, (2) data collection, (3) data purging: making data clean, current, curated and contextualized to create something profound, (4) implementation of intelligent workflows and humanized experiences that require skills and architecture to use data streaming from IoT, social media, pictures and video. This approach allows cognitive telcos to infuse AI in any process across different divisions: network, human talent, marketing, sales etc.
Transform the way Telcos manage their network
Network is a great example of how telcos are using AI to support process automation and support executive and employees in key decision making.
Some recurring application of artificial intelligence and automation in network operations are:
Customer Service Operations: A CSOC provides tools and processes to proactively monitor and manage end-to-end service quality with predictive insights, augmented with AI; thus, enabling operators to prioritize actions based on impact to services and customer experience.
Cognitive Network Operations: Generate efficiencies and optimization in Network Operations Center for Level 1 and Level 2 operations engineers & managers. Applies analytics and cognitive to network allowing for simplified and focused operations.
Network 360: Get ahead of anomalous network activity and degradations with ML models to detect, prevent, and recommend repair for network performance.
NBN, an Australian wholesale network provider, is at the forefront of AI application for network management: they updated their network management operations with AI, analytics and robotics in order to improve efficiency and sustain an 8x growth.
A key enabler of such success cases is hybrid cloud, allowing telcos to run applications and access data from across multiple disparate platforms.
Become a Cognitive Telco: Data + Strategy + People
In order to become a Cognitive Enterprise and outperform their peers, telcos need to collect and leverage their data, to implement a strategy that bases decision making on data and to create a partnership between humans and technology.
For more content like this visit https://www.ibm.com/industries/telecom-media-entertainment
We are excited to welcome three new members to the LF AI Foundation – Montreal AI Ethics Institute, Pranveer Singh Technology Institute, and Penn State Great Valley. These organizations join us as Associate members. The LF AI Associate membership is reserved pre-approved non-profits, open source projects, and government entities. Learn a bit more about these organizations in their own words:
Montreal AI Ethics Institute
The Montreal AI Ethics Institute is an international, non-profit research institute dedicated to defining humanity’s place in a world increasingly characterized and driven by algorithms. We do this by creating tangible and applied technical and policy research in the ethical, safe, and inclusive development of AI. Our goal is to build public competence and understanding of the societal impacts of AI and to equip and empower diverse stakeholders to actively engage in the shaping of technical and policy measures in the development and deployment of AI systems. We are a digital-first civil society organization that brings together a diversity of individuals from different disciplines, areas of expertise, and geographic regions.
Pranveer Singh Institute of Technology
There is a future we believe in, fostered by strong souls and inventive minds. We believe, the way to usher in change, is to empower young minds with a stellar education, thus creating the enablers of tomorrow…PSIT stands out as a premier centre of higher learning with a mission of pursuing excellence in education and research. The various departments, with their diverse and dynamic community of students, accomplished faculty offer a distinctive combination of some of the finest undergraduate and postgraduate programs, world class facilities and a residential campus set on a sprawling 80 acres of sylvan surroundings.
Penn State – Great Valley
Penn State Great Valley offers master’s degrees and graduate certificates in data analytics, technology, engineering, leadership, finance, and business. We pride ourselves on our highly-rated programs that teach students valuable practical skills through immersive learning. Faculty engage in cutting-edge research and work with a variety of corporate, government, and educational organizations. Our diverse student population has a multitude of opportunities to engage in research and apply their knowledge outside of the classroom, bridging the gap between theory and practice. Courses are offered in a flexible format to accommodate the demands of work, family, and life in general.
Welcome New Members!
We look forward to partnering with these new LF AI Foundation members to help support open source innovation and projects within the artificial intelligence (AI), machine learning (ML), and deep learning (DL) space. Welcome to our new members!
Interested in joining the LF AI community as a member? Learn more here and email firstname.lastname@example.org for more information and/or questions.
2020 has been a busy year for the LF AI Foundation (LF AI) and we are thrilled to see the continued enthusiasm among the overall community and the growth of our hosted technical projects. With half the year behind us, we’re taking a moment to reflect on the key highlights.
LF AI launched two years ago with ten members, and is now at a total of 24 members across our Premier, General, and Associate levels. In the first half of 2020, we’ve seen extra momentum in our Associate member category, with several educational institutions joining us; including Montreal AI Ethics Institute, Pranveer Institute of Technology, and Penn State Great Valley. We also welcomed two non-profit organizations, AI for People and Ambianic.ai who have both been very active among the LF AI community right away.
It’s been great to see a diverse group of companies getting involved within LF AI across various industries. We welcome those interested in contributing to the support of open source projects within the artificial intelligence (AI), machine learning (ML), and deep learning (DL) space to learn more about membership opportunities here.
Our technical project portfolio grew to twelve projects; of which three are Graduated and nine are Incubating. At the end of June, the LF AI Technical Advisory Council (TAC) approved three additional Incubating projects in the Trusted AI space; these projects are undergoing onboarding into the Foundation and will be formally announced soon, stay tuned! The TAC is continually working to bring in new open source projects, if you are interested in hosting a project with LF AI, check out the proposal process here and email email@example.com to further discuss.
The launch of the LF AI Interactive Landscape has continued to be a great tool to gain insights into how LF AI projects, among many others, fit into the space of open source AI, ML, and DL. As of the end of June, the landscape covers 248 projects coming from over 130 founding organizations universities. These projects collectively earned over 1.4 million GitHub Stars, and cover over 450 millions lines of code coming from over 30 thousand developers! Explore the landscape and please reach out to help us expand it with your own open source project or let us know of other projects that should be included by emailing firstname.lastname@example.org.
We are excited to have seen participation increase in two key initiatives. The ML Workflow & Interop Committee, is focused on defining an ML Workflow and promoting cross project integration and interoperability. The Trusted AI Committee is focused on creating policies, guidelines, tooling, and use cases by industry in this very important space. Both of these committees are open for participation and we welcome anyone interested to join the conversations by subscribing to the mail lists or attending an upcoming meeting; check out their wiki pages for more information.
Despite the challenges that COVID-19 has presented with in person gatherings, our community did not let that prevent them from moving forward with their planned events and instead pivoted to virtual formats. There have been two LF AI Days this year; the first being focused on an ONNX Community Virtual Meetup, followed by a Virtual LF AI Day EU for those based in that region. LF AI Days are regional, one-day events hosted and organized by local members with support from LF AI and its projects. Visit our LF AI Events page for more details on upcoming events and be sure to join us for one soon!
The LF AI community continues to grow! If you haven’t already, check out below a few ways to stay connected with LF AI:
We are excited to see what the second half of 2020 brings and how LF AI can influence the AI, ML, and DL space; we hope you will be a part of the journey! Check out our How to Get Involved Guide or email us at email@example.com for any questions on how to participate.
Sparklyr, an LF AI Foundation Incubation Project, has released version 1.3.0! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics.
In version 1.3.0, sparklyr adds a variety of improvements; highlights include:
Now supports seamless integration of Spark higher-order functions with R (similar to how dplyr allows R users to compose clear and concise data-manipulation verbs instead of long SQL queries)
After seeing popular demand for Apache Avro functionalities in sparklyr, spark_read_avro, spark_write_avro, sdf_from_avro, and sdf_to_avro methods are implemented to make working with Apache Avro simpler for sparklyr users (context: Apache Avro is a popular data serialization format that combines flexibility of JSON schema definition with efficiency of binary serialization of data columns)
It is now also possible to run user-defined R serialization and deserialization procedures on Spark worker nodes through sparklyr
The power of open source projects is the aggregate contributions originating from different community members and organizations that collectively help drive the advancement of the projects and their roadmaps. The sparklyr community is a great example of this process and was instrumental in producing this release. The sparklyr team wanted to give a special THANK YOU to the following community members for their contributions via pull requests (listed in chronological order):
Contributions take many forms, roadmap input for sparklyr 1.3 from Javier Luraschi ([#2434 and #2552). And great insight from @mattpollock and @benmwhite on several issues (#1773, #2514). Truly a great team effort for this release!
Congratulations to the sparklyr team and we look forward to continued growth and success as part of the LF AI Foundation! To learn about hosting an open source project with us, visit the LF AI Foundation website.
The LF AI Foundation (LF AI), the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), and deep learning (DL), today is announcing Marquez as its latest Incubation Project. Marquez is an open source metadata service for the collection, aggregation, and visualization of a data ecosystem’s metadata. It maintains the provenance of how datasets are consumed and produced, provides global visibility into job runtime and frequency of dataset access, centralization of dataset lifecycle management, and much more.
“The Marquez community is excited to join the LF AI. This is the next step for Marquez to become an integral part of the wider data community and be the standard for lineage and metadata collection” said Julien Le Dem, CTO of Datakin. “We are very pleased to welcome Marquez to LF AI. Machine learning requires high quality data pipelines and Marquez gives visibility into data quality, enables reproducibility, facilitates operations, and builds accountability and trust,” said Dr. Ibrahim Haddad, Executive Director of LF AI. “We look forward to supporting this project and helping it to thrive under a neutral, vendor-free, and open governance.” LF AI supports projects via a wide range of benefits; and the first step is joining as an Incubation Project. Full details on why you should host your open source project with LF AI are available here.
Marquez enables highly flexible data lineage queries across all datasets, while reliably and efficiently associating (upstream, downstream) dependencies between jobs and the datasets they produce and consume.
Marquez is a modular system and has been designed as a highly scalable, highly extensible platform-agnostic solution for metadata management. It consists of the following system components:
Metadata Repository: Stores all job and dataset metadata, including a complete history of job runs and job-level statistics (i.e. total runs, average runtimes, success/failures, etc).
Metadata API: RESTful API enabling a diverse set of clients to begin collecting metadata around dataset production and consumption.
Metadata UI: Used for dataset discovery, connecting multiple datasets and exploring their dependency graph.
Marquez’s data model emphasizes immutability and timely processing of datasets. Datasets are first-class values produced by job runs. A job run is linked to versioned code, and produces one or more immutable versioned outputs. Dataset changes are recorded at different points in job execution via lightweight API calls, including the success or failure of the run itself.
A warm welcome to Marquez and we look forward to the project’s continued growth and success as part of the LF AI Foundation. To learn about how to host an open source project with us, visit the LF AI website.
Adlik, an LF AI Foundation Incubation-Stage Project, has released version 0.1.0. We’re thrilled to see a release from this community who has been hard at work the past few months! Adlik is a toolkit for accelerating deep learning inference, which provides an overall support for bringing trained models into production and eases the learning curves for different kinds of inference frameworks. In Adlik, Model Optimizer and Model Compiler delivers optimized and compiled models for a certain hardware environment, and Serving Engine provides deployment solutions for cloud, edge and device.
In version 0.1.0, Adlik enhances features, increases useability, and addresses miscellaneous bug fixes. A few of the release highlights include the following:
A new framework which is easy to expand and maintain
Compilation of models trained from Keras, Tensorflow, and Pytorch for better execution on CPU/GPU
Multi nodes multi GPUs training and pruning
Configurable implementation of filter pruning to achieve smaller size of inference models
Small batch dataset quantization for TF-Lite and TF-TRT
Management of multi models and multi versions
HTTP/GRPC interfaces for inference service
Runtime scheduler that supports scheduling of multi model instances
Integration of multiple DL inference runtime, including TensorFlow Serving, OpenVINO, TensorRT and TF Lite
Integration of dlib to support ML runtime
This release also contains a Benchmark Test Framework for DL Model, which enables a standardized benchmark test for performance of models running in the same hardware environment with different runtime supported by Adlik. In this framework, the whole testing pipeline is auto executed with a containerized solution.
The Adlik team expressed a special thank you to contributors from ZTE, China Mobile, and China Unicom for their extra hard work.
The Adlik Project invites you to adopt or upgrade to version 0.1.0, and welcomes feedback. To learn more about the Adlik 0.1.0 release, check out the full release notes. Want to get involved with Adlik? Be sure to join the Adlik-Announce and Adlik Technical-Discuss mailing lists to join the community and stay connected on the latest updates.
Congratulations to the Adlik team! We look forward to continued growth and success as part of the LF AI Foundation. To learn about hosting an open source project with us, visit the LF AI Foundation website.