RosaeNLG Joins LF AI & Data as New Sandbox Project

By Blog

LF AI & Data Foundation—the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), deep learning (DL), and Data open source projects, is announcing RosaeNLG joining the Foundation as its first Sandbox Project. 

The Sandbox stage was recently added by the LF AI & Data Technical Advisory Council (TAC) to accommodate early stage projects that meet one or more  of the following requirements:

  • Any project that intends to join LF AI & Data Incubation in the future and wishes to lay the foundations for that.
  • New projects that are designed to extend one or more LF AI & Data projects with functionality or interoperability libraries. 
  • Independent projects that fit the LF AI & Data mission and provide the potential for a novel approach to existing functional areas (or are an attempt to meet an unfulfilled need).

RosaeNLG is a great fit for this stage and was voted by the TAC into incubation at the Sandbox stage. It is an open source Natural Language Generation (NLG) project that aims to offer the same NLG features as product NLG solutions and to be developer and IT friendly for ease of integration and configuration. RosaeNLG was released and open sourced by Ludan Stoecklé, CTO of Data & AI Lab at BNP Paribas CIB and Expert Professor at aivancity school. 

Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “RosaeNLG is a great foundational project that aims to broaden the accessibility and understandability of AI. We’re excited to welcome RosaeNLG as our first Sandbox stage project and look forward to supporting its journey for increased adoption, growth, and collaboration with other projects.”  

Template-based Natural Language Generation (NLG) automates the production of relatively repetitive texts based on structured input data and textual templates, run by an NLG engine. Production usage is widespread in large corporations, especially in the financial industry.

Typical use cases are:

  • Describing a product based on its features for SEO purposes
  • Produce structured reports such as risk reports or fund performance in the financial industry
  • Generate well formed chatbot answers

RosaeNLG templates are developed on VSCode with a friendly syntax and are easy to integrate. It currently supports languages such as English, French, German, Italian, and Spanish, with linguistic resources. It also provides NLG on both the server-side (using node.js REST API) and the browser-side.

Ludan Stoecklé, the founder of RosaeNLG, said: “Non-expert users don’t understand long tables of figures and dashboards; they prefer simple textual explanations. NLG is key in the democratization and understandability of data in general and to trusted AI in particular. Template-based NLG is the only way to achieve complex data-to-text projects without any error or hallucination in the texts, which is mandatory for trust. The support of the LF AI & Data Foundation will foster adoption and community growth, as well as diversity in NLG domain, with the goal to support more than 50 commonly spoken languages.”

LF AI & Data supports projects via a wide range of services, and the first step is joining the Foundation in incubation. Learn more about RosaeNLG on their GitHub and be sure to join the RosaeNLG-Announce and RosaeNLG-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to RosaeNLG! We look forward to the project’s continued growth and success as part of the LF AI & Data Foundation. To learn about how to host an open source project with us, visit the LF AI & Data website.

RosaeNLG Key Links

LF AI & Data Resources

ONNX 1.9 Release Now Available!

By Blog

ONNX, an LF AI & Data Foundation Graduated Project, has released version 1.9! ONNX is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. 

In version 1.9, ONNX adds a variety of improvements. Highlights include

  • Selective schema loading of specific operator set versions to reduce memory usage in runtimes
  • New and updated operators to support more data types and models including advanced object detection models like MobileNetV3, YOLOv5
  • Improved tools for splitting large, multi-GB models into separate files
  • More details and sample tests added to operator documentation
  • Version converter enhanced to make it easier to upgrade models to newer operator sets

To learn more about the ONNX 1.9 release, check out the full release notes. Want to get involved with ONNX? Be sure to join the ONNX Announce mailing list to join the community and stay connected on the latest updates. 

Congratulations to the ONNX team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

ONNX Key Links

LF AI & Data Resources

New LF AI & Data Member Welcome – Q1 2021

By Blog

We are excited to welcome six new members to the LF AI & Data Foundation. VMware has joined as a General Member, and Galgotias University, High School Technology Services, OpenUK, Ken Kennedy Institute, and the University of Washington – Tacoma have joined as Associate Members. 

The LF AI & Data Foundation will build and support an open community and a growing ecosystem of open source AI, data and analytics projects, by accelerating development and innovation, enabling collaboration and the creation of new opportunities for all of the members of the community.

Learn more about the new organizations in their own words below:

General Members

The LF AI & Data General membership is targeted for organizations that want to put their organization in full view in support of LF AI & Data and our mission. Organizations that join at the General level are committed to using open source technology, helping LF AI & Data grow, voicing the opinions of their customers, and giving back to the community.

VMware streamlines the journey for organizations to become digital businesses that deliver better experiences to their customers and empower employees to do their best work. Our software spans App Modernization, Cloud, Networking & Security, and Digital Workspace.

Associate Members

The LF AI & Data Associate membership is reserved for pre-approved non-profits, open source projects, and government entities. 

Galgotias University is devoted to excellence in teaching, research and innovation, and to develop leaders who’ll make a difference to the world. The University, which is based in Greater Noida, has an enrollment of over 15,000 students across more than 100 Undergraduate and Postgraduate programs.

High School Technology Services strives to provide the highest quality information technology services to high schools, teenagers, and adults. From creating websites to building computer labs, from offering counseling sessions to designing preparation programs, we aim to provide an assortment of services to support the dreams of these teenagers, their parents, and the schools.


OpenUK advocates for Open Technology being open source software, open source hardware, and open data, “Open” in and for business communities across the UK. As an industry advocacy organization, OpenUK gives its participants greater influence than they could ever achieve alone. Collaboration is central to everything we do and we use it to bring together business, public sector, and community in the UK to collaborate locally and globally.


The Ken Kennedy Institute is the virtual home of over two hundred faculty members and senior researchers at Rice University spanning computer science, mathematics, statistics, engineering, natural sciences, humanities, social sciences, business, architecture, and music.

The Institute brings together the Rice community to collaboratively solve critical global challenges by fostering innovations in computing and harnessing the transformative power of data. We enable new conversations, drive interdisciplinary research in AI and data science, develop new technology to serve society, advance the current and future workforce, promote an ecosystem of innovation and entrepreneurship, and develop academic, industry, and community partnerships in the computational sciences.


University of Washington – Tacoma is an urban-serving university providing access to students in a way that transforms families and communities. We impact and inform economic development through community-engaged students and faculty. We conduct research that is of direct use to our community and region. And, most importantly, we seek to be connected to our community’s needs and aspirations.

Welcome New Members!

We look forward to partnering with these new LF AI & Data Foundation members to help support open source innovation and projects within the artificial intelligence (AI), machine learning (ML), deep learning (DL), and data space. Welcome to our new members!

Interested in joining the LF AI & Data community as a member? Learn more here and email for more information and/or questions. 

LF AI & Data Resources

Egeria 2.8 Release Now Available!

By Blog

Egeria, an LF AI & Data Foundation Graduate Project, has released version 2.8! Egeria is an open source project dedicated to making metadata open and automatically exchanged between tools and platforms, no matter which vendor they come from. 

In version 2.8, Egeria adds a variety of improvements. Highlights include:

  • New support for event and property filtering for the open metadata server security connector
    • The repository services support 3 filtering points for managing events for the OMRS Cohort Topic, however, the filtering points are set up in the configuration document of the server. This configuration provides no control to allow filtering of events for specific instances. Version 2.8 extends the metadata server security connector so it can be called at these same filter points.
    • The security server connector will have two new interfaces that it can implement: one for the cohort events and one for saving events to the local repository.
      • The event interface will have two methods, one for sending and one for receiving. The parameters will include the cohort name and the event contents. It can return the event unchanged, return a modified event (e.g. with sensitive content removed) or return null to say that the event is filtered out.
      • The event saving interface will receive the instance header and can return a boolean to indicate if the local repository should store it. If true is returned, the refresh event sequence is initiated. The repository connector then has the ultimate choice when the refreshed instance is returned from the home repository as to whether to store it or not.
  • Changes to metadata types
    • Updates to the location types in model 0025:
      • Add the mapProjection property to the FixedLocation classification
      • Change the address property to networkAddress in the CyberLocation classification
      • Deprecated HostLocation in favor of the AssetLocation relationship
    • Deprecate the RuntimeForProcess relationship since it is superfluous – use ServerAssetUse since Application is a SoftwareServerCapability.
    • Replace the deployedImplementationType property with the businessCapabilityType in the BusinessCapability since it is a more descriptive name.
  • New performance workbench for the CTS (technical preview)
    • The performance workbench intends to test the response time of all repository (metadata collection) methods for the technology under test. The volume of the test can be easily configured to also test scalability.
  • New interface for retrieving the complete history of a single metadata instance
    • Two new (optional) methods have been introduced to the metadata collection interface:
      • getEntityDetailHistory
      • getRelationshipHistory
    • Both methods take the GUID of the instance for which to retrieve history, an optional range of times between which to retrieve the historical versions (or if both are null to retrieve all historical versions), and a set of paging parameters.
    • If not implemented by a repository, these will simply throw FunctionNotSupported exceptions by default to indicate that they are not implemented.
  • Splitting of CTS results into multiple smaller files
    • Up to this release, the detailed results of a CTS run could only be retrieved by pulling a huge (100’s of MB) file across the REST interface for the CTS. Aside from not typically working with most REST clients (like Postman), this had the additional impact of a sudden huge hit on the JVM heap to serialize such a large JSON structure (immediately grabbing ~1GB of the heap). While this old interface still exists for backwards compatibility, the new default interface provided in this release allows users to pull down just an overall summary of the results separately from the full detailed results, and the detailed results are now broken down into separate files by profile and test case: each of which can therefore be retrieved individually.
  • Bug fixes and other updates
    • Additional Bug Fixes
    • Dependency Updates

To learn more about the Egeria 2.8 release, check out the full release notes. Want to get involved with Egeria? Be sure to join the Egeria-Announce and Egeria-Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the Egeria team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

Egeria Key Links

LF AI & Data Resources

SOAJS Joins LF AI & Data as New Incubation Project

By Blog

LF AI & Data Foundation, the organization building an ecosystem to sustain open source innovation in artificial intelligence (AI), machine learning (ML), deep learning (DL), and Data open source projects, today is announcing SOAJS as its latest Incubation Project. 

SOAJS is an open source microservices and API management platform. SOAJS simplifies and accelerates the adoption of multi-tenant microservices architecture by eliminating proliferation pain. The SOAJS platform empowers organizations to create and operate a microservices architecture capable of supporting any framework while providing API productization, multi-tenancy, multi-layer security, cataloging, awareness, and adaptable to existing source code; to automatically catalog and release software components with multi-tenant, multi-version and multi-platform capabilities. SOAJS integrates and orchestrates multiple infrastructures and technologies in a simplistic and secured approach while accelerating the release cycle with custom continuous integration & a smart continuous delivery pipeline. The SOAJS platform is capable of creating and managing custom environments per product, per department, per team, per resource, and per technology in a simplistic approach empowering every member of the organization.

SOAJS was released and open sourced by Herron Tech.

Dr. Ibrahim Haddad, Executive Director of LF AI & Data, said: “We are happy to welcome SOAJS to LF AI & Data and help it thrive in a neutral, vendor-free environment under an open governance model. With SOAJS, we’re able now to offer a Foundation project that offers a complete enterprise open source microservice management platform and already has ongoing collaborations with existing LF AI & Data projects such as Acumos. We look forward to tighter collaboration between SOAJS and all other projects to drive innovation in data, analytics, and AI open source technologies.” 

API Aware Pipeline can be a key contribution to LF AI ML Workflow and Interop Committee

DevOps automation is limited to infrastructure deployment and source code release addresses only a fraction of the challenges of managing the development, deployment, and operation of large numbers of APIs and Microservices in today’s complex environments. For teams to achieve the agility promised by modern application development, they need DevOps automation that is API and Microservice aware.

SOAJS delivers API-Aware DevOps, with rich API- and Microservice-optimized automation capabilities that enable high-performance, agile execution.

The SOAJS API-Aware pipeline can be a key contribution to LF AI’s ML Workflow and Interop Committee by helping multiple projects close the loop and take advantage of its Multi Environment Marketplace, Automated Cataloging, Smart Deployment, Multi Tenant Authentication/Authorization Gateway, and Middleware to standardize, release, deploy and operate ML models the way Acumos is using it today.

Antoine Hage, co-creator of SOAJS and co-founder of Herron Tech, the sponsor of SOAJS, said: “We are thrilled to join the LF AI & Data community and look forward to growing SOAJS while helping other projects solve the interoperability challenges.”

SOAJS is the only end-to-end microservices management platform to help transform and achieve durable agility by taking advantage of our complete and adaptable solution. No other competitor has our unique features and capabilities. 

Key features of the SOAJS platform include:

  • API Management and Marketplace
    • API Builder: passthrough and smart endpoint
    • API Framework: build microservice tenfold faster
    • Heterogeneous Catalog with source code integration and adaptation 
    • Complete pipelines management for APIs and resources
      • API: service, microservices, passthrough, smart endpoint
      • Daemons: cron jobs, interval, parallel
      • Resource: clusters, bridge to existing, binary
      • Front end: nginx, Multi domain, Automated SSL
      • Custom: custom applications, custom packages, monolithic
      • Recipe: deploy recipe, standardization 
    • Easily adapt to existing APIs & legacy systems
  • Multi Environment Orchestration and Deployment
    • Cloud orchestration & distributed architecture
    • Smart multi environment deploy with ledger, rollback and multi version support
    • Deploy from any source code and binary
    • Infra as Code native & 3rd party support
    • Container & VM orchestration
    • Import/export/clone environments
    • Custom CI/CD with Smart Release
    • Support multi continuous integration server
    • Support multi GIT server
  • Multi Tenant Authentication and Authorization Gateway
    • API productization and packaging
    • Multi environment and multi version support 
    • Automatic awareness and mesh among microservices
    • Multi tenant authentication and authorization with roaming
    • Registry and resource configuration and management
    • API monitoring and performance measurement
  • CLI and Single Pane of Glass Management Console
    • Full monitoring, high availability, analytic and control
    • Correlation between resources and traffic analytic vs errors in logs
    • User management with federation and access control
    • Notification with trackability ledger

LF AI & Data supports projects via a wide range of services, and the first step is joining as an Incubation Project.  LF AI & Data will support the neutral open governance for SOAJS to help foster the growth of the project. Check out the Get Started Guide and Demos to start working with SOAJS today. Learn more about SOAJS on their website and be sure to join the SOAJS-Announce and SOAJS-Technical-Discuss mail lists to join the community and stay connected on the latest updates. 

A warm welcome to SOAJS! We look forward to the project’s continued growth and success as part of the LF AI & Data Foundation. To learn about how to host an open source project with us, visit the LF AI & Data website.

SOAJS Key Links

LF AI & Data Resources

Join LF AI & Data at Kubernetes AI Day!

By Blog

The LF AI & Data Foundation is pleased to be a co-host at the upcoming Kubernetes AI Day! The event will be held virtually on May 4, 2021, and registration is only US$20.

Kubernetes is becoming a common substrate for AI that allows for workloads to be run either in the cloud or in its own data center, and to easily scale. This event is great for developers who are interested in deploying AI at scale using Kubernetes. 

The agenda is now live! Please note the times below are displayed in Pacific Daylight Time (PDT).

Tuesday, May 4, 2021

1:00 PDT

Opening Remarks

1:05 PDT

Scaling ML pipelines with KALE — the Kubeflow Automated Pipeline Engine

1:40 PDT

A K8s Based Reference Architecture for Streaming Inference in the Wild

2:15 PDT

Embrace DevOps Practices to ML Pipeline Development

2:45 PDT


3:05 PDT

Taming the Beast: Managing the day 2 operational complexity of Kubeflow

3:40 PDT

The SAME Project: A Cloud Native Approach to Reproducible Machine Learning

4:10 PDT


4:25 PDT

Stand up for ethical AI! How to detect and mitigate AI bias using Kubeflow

5:00 PDT

The production daily life: An end to end experience of serverless machine learning, MLOps and models explainability

6:30 PDT

Closing Remarks

Visit the event website for more information about the schedule and speakers. Join us by registering to attend Kubernetes AI Day – Register Now!

The LF AI & Data Foundation’s mission is to build and support an open AI community, and drive open source innovation in the AI, ML, and DL domains by enabling collaboration and the creation of new opportunities for all the members of the community. 

Want to get involved with the LF AI & Data Foundation? Be sure to subscribe to our mailing lists to join the community and stay connected to the latest updates.

LF AI & Data Resources

Sparklyr 1.6 Release Now Available!

By Blog

Sparklyr, an LF AI & Data Foundation Incubation Project, has released version 1.6! Sparklyr is an R Language package that lets you analyze data in Apache Spark, the well-known engine for big data processing, while using familiar tools in R. The R Language is widely used by data scientists and statisticians around the world and is known for its advanced features in statistical computing and graphics. 

In version 1.6, sparklyr adds a variety of improvements. Highlights include:

  • Sparklyr now has an R interface for Power Iteration Clustering
    • Power Iteration Clustering is a scalable and efficient graph clustering algorithm. It finds low-dimensional embedding of a dataset using truncated power iterations on a normalized pair-wise similarity matrix of all data points, and runs k-means algorithm on the embedded representation.
  • Support for approximate weighted quantiles to `sdf_quantile()` and `ft_quantile_discretizer()`
    • Sparklyr 1.6 features a generalized version of the Greenwald-Khanna algorithm that takes weights of sample data into account when approximating quantiles of a large number of data points.
    • Similar to its unweighted counterpart, the weighted version of the Greenwald-Khanna algorithm can be executed distributively on multiple Spark worker nodes, with each worker node summarizing some partition(s) of a Spark dataframe in parallel, and quantile summaries of all partitions can be merged efficiently. The merged result can then be used to approximate weighted quantiles of the dataset, with a fixed upper bound on relative error on all approximations.
  • `spark_write_rds()` was implemented to support exporting all partitions of a Spark dataframe in parallel into RDS (version 2) files. This functionality was designed and built to avoid high memory pressure on the Spark driver node when collecting large Spark dataframes.
    • RDS files will be written to the default file system of the Spark instance (i.e., local file if the Spark instance is running locally, or a distributed file system such as HDFS if the Spark instance is deployed over a cluster of machines).
    • The resulting RDS files, once downloaded onto the local file system, should be deserialized into R dataframes using `collect_from_rds()` (which calls `readRDS()` internally and also performs some important post-processing steps to support timestamp columns, date columns, and struct columns properly in R).
  • Dplyr-related improvements:
    • Dplyr verbs such as `select`, `mutate`, and `summarize` can now work with a set of Spark dataframe columns specified by `where()` predicates (e.g.,  `sdf %>% select(where(is.numeric))` and `sdf %>% summarize(across(starts_with(“Petal”), mean))`, etc)
    • Sparklyr 1.6 implemented support for `if_all()` and `if_any()` for Spark dataframes
    • Dbplyr integration in sparklyr has been revised substantially to be compatible with both dbplyr edition 1 and edition 2 APIs

As usual, there is strong support for sparklyr from our fantastic open-source community! In chronological order, we thank the following individuals for making their pull request part of sparklyr 1.6:

To learn more about the sparklyr 1.6 release, check out the full release notes. Want to get involved with sparklyr? Be sure to join the sparklyr-Announce and sparklyr-Technical-Discuss mailing lists to join the community and stay connected on the latest updates. 

Congratulations to the sparklyr team and we look forward to continued growth and success as part of the LF AI & Data Foundation! To learn about hosting an open source project with us, visit the LF AI & Data Foundation website.

Sparklyr Key Links

LF AI & Data Resources

Thank you ONNX & Baidu Paddle Paddle for Hosting a Great LF AI & Data Day!

By Blog

A big thank you to ONNX and Baidu Paddle Paddle for hosting a great virtual meetup! The LF AI & Data Day ONNX Community Virtual Meetup was held on March 24, 2021 and was a great success with over 100 attendees joining for part of the three hour event.

The meetup included ONNX Community updates, partner/end-user stories, and SIG/WG updates. The virtual meetup was an opportunity to connect with and hear from people working with ONNX across a variety of groups. A special thank you to Ti Zhou from Baidu Paddle Paddle for working closely with the ONNX Technical Steering Committee, SIGs, and ONNX community to curate the content. 

Missed the meetup? Check out all of the presentations and recordings here.

This meetup took on a virtual format but we look forward to connecting again at another event in person soon. LF AI & Data Day is a regional, one-day event hosted and organized by local members with support from LF AI & Data, its members, and projects. If you are interested in hosting an LF AI & Data Day please email to discuss.

ONNX, an LF AI & Data Foundation Graduated Project, is an open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them.  Be sure to join the ONNX-Announce mailing list to join the community and stay connected on the latest updates. You can join technical discussions on GitHub and more conversations with the community on LF AI & Data ONNX Slack channels.

ONNX Resources

LF AI & Data Resources

Resources for Data Scientist and Machine Learning Professionals

By Blog

Guest Author: Matt Zand, President of High School Technology Services

Whether you are new to the field of data science or you would like to brush up on your current skills, the resources listed here would be a great help to you. For beginners, they should start by learning Python procedural coding followed by mastering Python Object-Oriented Programming (OOP). Python is a very powerful, yet easy to learn programming language. If you are familiar with the logic of programming, learning Python would be easy.

Once you master Python, you can move on to learning how to use Python for data analytics. Techniques and tools you use for running data analytics are common among data scientists that run analytics on large data sets (or so-called “Big Data”) on a daily basis.

The other two popular applications of Python are machine learning and Artificial Intelligence (AI). In short, by utilizing Python, machines will learn from their system and users and begin to replicate the business processes without any human interactions. Along similar lines, Python is used for AI by automating business processing and system routine transactions often via assistance from Internet of Things like hardware devices.

In short, the resources provided in this article can serve as great guides for those interested in pursuing a career in Data Science, Machine Learning, Big Data, Data Analytics, and AI.

Python, Machine Learning and Data Science Resources

LF AI & Data Key Links

How Explainable AI is Changing the Bank and Finance Industry

By Blog

Guest Author: Dr. Jagreet Kaur, Chief AI Officer, Xenonstack

Boost Banks Performance using XAI 

Machine Learning has automated business operations and makes them more efficient, improves services, and enriches customer interaction. But it is noticed that the AI systems are biased and discriminate while providing services based on gender, race, or ethnicity. As most advanced ML algorithms have opaque functioning, noticing biases and tracking model decisions is tough. Thus these systems lose a customer as well as banker’s trust. This issue is known as the black-box problem. 

Haunting Fraud: The use of AI and ML for haunting fraud helps to automate the task and detect fraud. But some of the cases are coming, that system misidentifies the customer and accidentally declined credit cards. Thus it disappoints the customer and loses their trust, which also has a reputational impact, and customers just stop using their services. Because the developer and bankers cannot spot whether the system works properly or not? And what is the reason that system declines a card? These mishappenings have occurred due to a lack of transparency in the system. 

Explainable AI can solve these problems by providing transparency and giving answers: 

  • How does the system decide that card should be declined? 
  • What is the reason behind individual approval or decline of a customer’s card? 

Banks and Financial institutions are investing in Explainable AI for solving these problems. We build the AI system using Explainable AI to make models transparent. Explainable AI makes model decisions more trustable. It also solves the issue of bias. 

Before/After: Before adopting Explainable AI, users can take output but do not know how it happened. But the use of Explainable AI essentially builds trust in the algorithm and helps to explain the system, so no one can say I don’t know what happened. 

Implementation: Visualization interprets the model and explains it. Various libraries and packages explain the model decision process, such as how the software reaches its conclusion. There are two dimensions of an interpretable system: 

  • Transparency helps to solve the black box model problem: It provides clarity; how does the model work?
  • Explainability helps organizations rationalize and understand AI decisions: “why did the model do that?”

Case Study to understand Explainable AI in Banks 

Banking industries have started automating their loan system using AI (Artificial Intelligence) that makes a decision or avail loan within a minute using customer’s data to predict their creditworthiness. It can decrease overdue loans, reduce credit loss and risk and decrease frauds. 

There is some cost associated with the incorrect decision of the model. Most of the models used for AI systems are black box in nature, which increases the business risk. Understanding model decisions is challenging due to a lack of transparency. 

The end customer can ask questions about the model that the developer could not answer due to opaque models; thus, it would not build the customer trust. 

Explainable AI in Loan Approval System 

Explainable AI builds customer trust by providing a transparent and clear methodology of the model. Explainable AI uses various frameworks and libraries to answer customers’ questions. Such as:

  • How is data contributing to making a decision? 
  • Which feature influences the result more? 
  • How changing the value of that particular feature affects the system output?
  • Why did the system decline the loan application of Mr. Jain? 
  • What is the income required to have for approving a loan? 
  • How do models make decisions? 

To make the model interpretable, we will divide our approach into three levels. And various questions picked from these. 

  • Global Explanation 
  • Local Explanation 
  • Feature interaction and distribution 

Some of these questions and methodologies to be used to answer those questions:

Questions of Stakeholder 

Methodology to be used 



Is it possible to enhance model explainability without damaging model performance?

Model accuracy vs. Model Explainability

Python and Visualization

How is data contributing to making a decision?

SHAP(SHapley Additive exPlanations)

Using SHAP library

How does model output vary by changing the Income of the borrower?

PDP (Partial Dependence Plot)/ICE(Individual Conditional Expectation)

PDP box

Why did the system decline the loan application of Mr. Jain?


LIME library

What is the income required to have for approving a loan?


Anchors from Alibi

How do models make decisions? 

defragTrees(For random forest)

defragTree Package

Table 1.1 

Global Level Explanation 

Question 1: How is data contributing to making a decision?

According to the model, ‘Credit history’, ‘Loan amount’ and ‘Total Income’ are the top three variables with the most impact on the application’s approval. 

The contribution of features in making decisions can help the customer trust the model. If correct parameters influence the results, it means the model works correctly. 

Figure 1.1 depicts the importance of the features in predicting the output. Features are sorted from top to bottom to decrease its weightage to make decisions.

Figure 1.1 

The probability of approval or rejection of the loan application depends on the person’s credit history. 

Q2: How is data contributing to making a decision? 

It is the next version of the previous graph and gives more insight into the model. It also shows the same things with more information about the feature’s value.  

  • Feature importance: Variables ranked in descending order of importance.
  • Impact: The horizontal location shows whether the effect of that value is associated with a higher or lower prediction.
  • Value: Color shows whether that variable is high or low for that observation. Red color devotes the high value and blue for less value. The variation in color of the dot shows the value of the feature. 
  • Correlation: The first parameter of Figure 1.2 depicts that the approval of application highly depends on credit history. Having a good credit history has more chances of approving a loan application. 

Figure 1.2 

Feature interaction and distribution 

Q3: How does model output vary by changing the borrower’s income? 

After getting the answer to the first question, the customer can ask how the change in Income changes the system output when other parameters are not changing? 

To answer this, let’s discuss the Partial Dependence Plot (PDP). PDP shows the relation between the model output and feature value where other features are marginalized. This graph shows how changing Income changes the system decision. 

Figure 1.3 

As we get an idea of the feature effect on the model decision, now we can go for a Local explanation to understand the prediction for an individual customer. 

Local Explanation 

Q4: Why did the system decline the loan application of Mr. Jain? 

Mr. Jain has applied for the loan. But the system rejects his application; now he wants to know why the system rejected his application. Using SHAP, the system justifies its result. The SHAP value represents the impact of feature evidence on the model’s output. 

Because Mr. Jain has a poor credit history, he has not repaid previous debt, and he also doesn’t have his income, the income of co-application is also low. These all factors move the system’s decision towards declining the application.

Figure 1.4 Mr. Jain’s justification 

Q5: Mr. John and Mr. Herry have almost the same parameters values, such as total income and credit history; then why did the system decline Mr. Herry’s application and approve Mr. John’s application? 

Both Mr. John and Mr. Herry have the same values for the attributes, but the AI system approves the loan application of Mr. John but not of Mr. Herry. 

To answer this question, Explainable AI uses a waterfall chart of SHAP. Let’s compare the justification for both Mr. Herry and Mr. John; it noticed that both have good credit history and values for other parameters except Income. Mr. Herry has a low salary compared to Mr. John, and thus the total income of Mr. John also decreased. That’s why the system decides that Mr. John will not return the loan, therefore, reject his application.

Figure 1.5 Mr. John’s justification 

Figure 1.6 Mr. Herry’s justification

How Explainable AI improves Bank AI systems? 

Explainable AI improves AI systems that banks are using: 

  • Build trust by providing greater visibility to spot flaws and unknown vulnerabilities. Thus assure that system operation. 
  • Improve performance by understanding how the model works and make decisions. 
  • It improves strategy and decision making as a result, also improves revenue, customer behavior, and employee turnover. 
  • Enhance control over the system. 
  • Identify mistakes and quickly work on them. 

Business Benefit of Explainable AI 

Business benefits of Explainable AI as shown in Figure 1.1: 

Figure 1.1 


  • Model Performance: Improves and optimizes AI systems by understanding the how and why of the systems while making decisions. It verifies system outputs and enhances them by detecting bias and flaws. 
  • Decision Making: Predicting customer churn is a widespread use case of ML that can tell that customer churn rate will increase. Now, suppose to reduce the churn rate, the financial institution reduces their fee, but the exact reason for increasing churn rate can be customer service experience. Fee reduction cannot solve the problem because the main reason behind the scene is customer interaction, not the fee. Therefore to know the correct reason, Explainable AI must understand why the churn rate is increasing. 


  • Control: It helps to retain control over AI. Visibility of AI models data and features helps identify issues(such as drift) and solve them. 
  • Safety: It tracks unethical design and works with the cyber team to safeguard against these faults. 


  • Ethics: With clear governance and security guards, it provides ethical consideration in their AI systems. 
  • Trust: Ensure that the algorithms make a correct decision using Explainable AI. It builds trust by strengthening the stability and predictability of interpretable models. 


  • Accountability: For a clear understanding of an AI system’s accountability, it is essential to understand how the model operates and evolves that can be provided by only Explainable AI in the case of black-box models. 
  • Regulation: Focuses on AI areas by establishing standards for governance, accuracy, transparency, and explainability. 


Contribution of the Explainable AI in Loan approval AI system makes it easy for the end-user to understand the AI systems’ complex working. It provides a human-centered interface to the user. Explainability is a key to producing a transparent, proficient, and accurate AI system that can help the bankers and the borrower understand and use it.

LF AI & Data Resources