Apache NiFi ‹› AI Fairness 360 (AIF360) Integration – Trusted AI Architecture Development Report 1

By October 30, 2019Blog

By Romeo Kienzler, Chief Data Scientist, IBM Center for Open Source Data and AI Technologies

We’re currently in the process of integrating the Trusted AI Toolkits into Acumos AI. There are of course many possibilities in doing so – therefore we’ve started the journey with an architecture development method. Maybe not many of you are familiar with TOGAF – The Open Group Architecture Framework – but we’re making use of it here in order to make sure the architectural choices of integrating the Trusted AI Toolkits are sound. As you can see below in Figure 1, the development of an architecture is an iterative process. Since we’ve completed one iteration, we want to give you an update on the actual development of those individual process steps.

Figure 1 – The Open Group Architecture Framework (TOGAF)

Preliminary

It is agreed by many AI practitioners that Trusted AI is a key property for large scale adoption of AI into enterprise. A set of questions illustrated in Figure 2, taken from Todd Moore’s (IBM VP, Open Technology) keynote at the Linux Foundation Conference in Lyon 2019 needs to be asked about every AI model apart from it’s generic performance metrics like accuracy, F1 score or area under ROC.

Figure 2: Questions to be asked once AI model training is done

Since open source toolkits exist within the Linux Foundation AI to answer these questions, the task is to find out, how they are used most efficiently.

Architecture Vision

To answer the questions above the following checks and mitigations against an AI model deployment candidate must be done:

  • Adversarial Robustness Assessment and Mitigation
  • Bias Detection and Mitigation
  • Explainability
  • Accountability (Repeatability / Data and Model Lineage)

A set of open source tools exist for each of those tasks. In the following we want to make sure to identify the correct tools and the correct ways of integrating them to maximize their positive impact. So the overall value proposition of this architecture is the removal of

major road blocks of brining AI models into production, by generating, qualifying and quantifying trust, affecting Stakeholders like Regulators, Auditors and Business Representatives. Ease of use and adoption rate are the main drivers for transformation and can be seen as the main KPIs here.

Therefore, the main risk is complexity. The higher the complexity, the more adoption rate declines.

Information Systems Architecture

In this current iteration we only focus on bias detection using the AIF360 toolkit. Bias mitigation, adversarial robustness, explainability and accountability will be covered in future versions of this document. Although such a component can be deployed in many ways in an abundance of different possible information systems architectures, we’ve identified only two different generic integrations scenarios, which we call data driven integration and model driven integration.

Data driven integration

In data driven integration the AI model is only taking part in such that it generates predictions which are stored to a database. Using the model’s predictions and the ground truth, bias metrics can be computed. This works exactly in the same way ordinary model validation is done on a test set which hasn’t been used for training the model. The test set is applied to the model and then it’s performance is assessed on the predictions generated by the model on the test set using a metric algorithm. So the same rule applies here, but instead of an ordinary metric algorithm, the algorithms for bias measurement are used. This process is illustrated in Figure 3.

Figure 3: Data driven integration uses predictions created by the model and ground truth data to compute bias metrics

Model driven integration

On the other hand, a second integration scenario might be feasible as well which we call “model driven integration”. In this case, no data is provided to the bias detection library, only the model and a configuration object which contains information on the protected attributes, the label columns and the schema. In this case, the model has to be executed in an appropriate runtime first using artificial data. If this is a feasible way of integration will be determined during the next iterations of this project. Figure 4 illustrates this.

Figure 4: The model is created using data but only the standalone model is assessed without any further need for data

Hybrid integration

Since model driven integration is not yet confirmed to work, we propose finally a hybrid integration, as illustrated in Figure 5, where the model is executed in an appropriate environment but has access to the test data set. This is very similar to the data driven approach which the difference that model predictions are not needed beforehand but are created on the fly during execution of the validation process. This might have advantages in the area of data lineage / accountability or facilitating operational aspects.

Figure 5: Hybrid integration allows the bias detection component accessing the test data set

Technology Architecture

Among others, integration into Acumos AI is one of the most important aspects of this project. Although other integration points exist, we first started with an evaluation of integration into Apache Nifi since Nifi will be part of the next release of Acumos AI and play a central role in data integration tasks. Therefore, a POC was conducted on integration the LF AI AIF360 (AI Fairness 360) toolkit as a custom processor into Apache Nifi using the data driven approach.

Opportunities and Solutions

For simplicity we started with the ExecuteStreamCommand Processor within Nifi which allows wrapping any executable which reads from STDIN and writes to STDOUT into a Nifi Processor.

Implementation Details

Environment Setup

We’ve used an Ubuntu 18.04 Server LTS system and installed Apache Nifi on top of it. The following script installs Apache Nifi and takes care of the necessary configuration:

apt update

apt install -y openjdk-8-jdk unzip git

wget http://mirror.easyname.ch/apache/nifi/1.9.2/nifi-1.9.2-bin.zip

unzip nifi-1.9.2-bin.zip

apt install -y python3-pip

pip3 install –upgrade pip

apt install -y python3-venv

python3 -m venv venv

source venv/bin/activate

pip3 install aif360

git clone https://github.com/romeokienzler/lfai_nifi.git

./nifi-1.9.2/bin/nifi.sh start

Test

Now it’s time to test. In a browser, Nifi can be accessed on port 8080. The file https://github.com/romeokienzler/lfai_nifi/blob/master/AIF360.xml contains the template to create the flow. Now it is possible to copy the “fair” and “unfair” test data into the “in” folder and it will be consumed by the Nifi flow. This flow is illustrated in Figure 6.

Figure 6: The sample Nifi flow which uses the bias detection processor

After the flow has run, the bias metrics are attached as attributes to the flowfile as shown in Figure 7.

Figure 7: Bias metrics are attached as attributes to the flowfile

Future Work

In the next steps we’ll add bias mitigation as well to this prototype. Then we’ll evaluate the other integration scenarios mentioned and identify the best way of integrating them to Acumos AI. Finally, we’ll integrate the remaining toolkits for adversarial robustness, explainability and accountability into Acumos AI.

Conclusion

We’ve shown that using a custom processor Trusted AI toolkits can be integrated into Apache Nifi and Acumos AI.