The Linux Foundation Projects
Skip to main content
Blog

Trusted AI Principles – RREPEATS Practical Examples Review

Guest Author: Susan Malaika, LF AI & Data Trusted AI Committee Member

On April 28, 2021, the LF AI & Data Trusted AI Principles Working Group hosted a session about applying the eight RREPEATS principles to two examples; one in the context of network providers, the other in banking. The Practical Examples session is a follow-up from the session in February that introduced the RREPEATS principles

After the presentation of the two examples, a round table discussion took place, covering the application of trusted AI principles in companies. A key outcome from the discussion was the recognition that low-level tactical tools can be used to implement trusted AI principles. Another outcome was the importance of having an ethics board in corporations.

Example: Classification of Encrypted Traffic Application

Iman Akbari Azirani and Noura Limam from the University of Waterloo in Canada, and Bertrand Mathieu from Orange Labs in France, explained that the Internet data payloads and, increasingly, headers are encrypted. Typically, 85% of Internet traffic is encrypted, and Google services are 100% encrypted, making it difficult to classify network traffic. In order to anticipate workloads and offer good services to customers, network operators need to understand the traffic by:

  • Providing accurate capacity planning
  • Detecting fraud, such as, attempting to mask data that would normally be paid for, as free traffic. A common example of fraud is transmitting video over a text service.

Bertrand, Noura, and Iman discussed three of the RREPEATS principles in the context of the encrypted traffic application; Equitability, Privacy, and Explainability.

To ensure equitability and privacy, significant amounts of information (e.g., temporal information, IP addresses, payloads, security certificates etc.) are removed from the data. For explainability, it is important to understand that classification is applied at the global level (service and application) and not at an individual level. Deep neural networks are used in classification and focus on three types of features (3 faceted models): Transport Layer Security (TLS) handshake bytes, traffic shape (size, direction, inter arrival times), and statistical features. In spite of being focused on protocol agnostic features, the classification accuracy of this approach is high.

This section ended with request for tools that:

  • Help with network classification 
  • Support efficient/speedier models running in the network
  • Offer better explanations 
  • Prevent attacks on model 

Example: RosaeNLG Framework (an LF AI & Data project)

Ludan Stoecklé, CTO of Data & AI Lab BNP Paribas CIB and initial author of RosaeNLG, outlined how to explain a decision to a non-expert user, focused on trusting the decision and being transparent. Apparently, it has been shown that non-expert users most often find textual explanations easier to understand rather than the output from corporate style dashboards, such as tables and graphs. Also, computer generated texts are often preferred over human written text, as the generated texts are clearer, less ambiguous, and more concise.

Ludan illustrated his points via credit application rejection and showed simple text generated via Natural Language Generation to explain the reason for rejection. The following considerations apply in preparing an explanation for a decision:

  • Interpret the decision
  • Define what to say
  • Define how to say it 

The RosaeNLG framework (an LF AI & Data Sandbox project) provides template-based Natural Language Generation. It automates the production of relatively repetitive texts based on structured input data and textual templates, and is widely used in the financial industry.

Dataset Discussion 

There was an important discussion about open datasets in the areas of network encryption and NLG, and the absence of open networking datasets that could move the field forward. Typically networking datasets are proprietary and complex. It is difficult to create a synthetic dataset that mimics real network traces, with many sessions from many users and with variable network conditions that end-users experience. However, Orange is investigating the possibility of opening a part of a completely anonymized dataset (and without any possibility to reverse engineer it and discover private information).

Round Table – Applying Trusted AI Principles in a Corporation

The first part of the round table discussion was focused on the examples presented. An observation that principles have different meaning in the context of different examples and that understanding the context of different scenarios is key. Souad Ouali, the Trusted AI Principles Working Group Chair, responded that the RREPEATS principles were very carefully and deliberately worded to be applicable in a general way in international and varied settings. The round table panelists agreed that Trusted AI principles must be incorporated into the DNA of what we do throughout the AI life cycle and it is important to ask questions relating to principles at every step. An important observation from listening to the two practical example presentations is how tactical low-level tools map to high-level Trusted AI principles.

The round table ended with a discussion on corporate ethics boards and whether they should be geographically diverse or distributed, incorporate domain experts, and extend into the operational aspect of a business. Another consideration is whether an ethics board should include representatives from other institutions.

Join Us for Our Next Session

Please join us for The Trusted AI Principles – Tools and Techniques webinar on October 27, 2021. Register here!

Stay connected with the Trusted AI Committee by joining the mailing list here and attend one of our upcoming meetings! Learn more here.

RREPEATS Links

LF AI & Data Resources

Author

  • Andrew Bringaze

    Andrew Bringaze is the senior developer for The Linux Foundation. With over 10 years of experience his focus is on open source code, WordPress, React, and site security.