By Ludan Stoecklé
RosaeNLG, a project under the LF AI & Data Foundation, automates the creation of repetitive reports and texts from structured data. It is commonly used in e-commerce to describe products and their features and in the financial industry for generating risk reports and financial fund performance reports.
Development of RosaeNLG began in 2017, with the first version released in 2019. In March 2021, RosaeNLG joined LF AI & Data as a Sandbox project.
This blog discusses real-life applications of RosaeNLG and explores its future amidst the growing use of Large Language Models (LLMs) for text generation.
Use cases: Busbud, Xanevo
Landing pages production for Busbud
Busbud is a company that sells bus and train tickets online, working with thousands of bus companies and serving over five million routes worldwide, with a website available in 18 locales.
Busbud uses RosaeNLG to create more varied texts for their landing pages. Creating landing pages for each route, locale, city, country, etc., at this scale is impossible to do manually. Therefore, they use RosaeNLG to generate the content programmatically. RosaeNLG is considered very easy to use and highly efficient in producing grammatically correct variations of texts in French, Spanish, English, and Portuguese.
Generation of e-commerce descriptions by Xanevo
Xanevo GmbH is an AI Consulting firm specializing in E-commerce, renowned for automating high-volume SEO content. Xanevo uses a variety of different AI technologies like LLMs, but also RosaeNLG for some of their customers in the e-commerce industry.
RosaeNLG is used to generate texts for meta descriptions and images (alt texts) in German, English and French. With RosaeNLG’s template-based approach, Xanevo can set the length of the texts to fall within a certain range. This is extremely important for these two types of text. If the length is not respected, the templates are called again and the synonym function causes the text to fall into the range at some point.
Is RosaeNLG still useful now that we have LLMs and ChatGPT?
RosaeNLG is a template-based language generator that transforms structured data (e.g., a financial situation) into text using manually written templates. Unlike ChatGPT or any LLM, which generate text from prompts, RosaeNLG excels at converting data to text. This template-based approach ensures strict control over the generated content, preventing hallucination or fabrication of facts. However, the drawback is that users must manually write these templates for each use case.
In practice, template-based generators like RosaeNLG remain a valid data-to-text solution when the following criteria are met:
- The structured input data exists
- The texts to be generated remain relatively repetitive
- You need strict control over the output
- You cannot accept errors or hallucinations
Check this article for a more detailed comparison of RosaeNLG with LLMs.
RosaeNLG roadmap
Developed since 2017, RosaeNLG is now a mature product, supporting English, French, German, Italian and Spanish. Releases by the original author Ludan Stoecklé happen 2-3 times a year, focusing on security updates and debug. Most of the new features are now proposed and developed by the community.
Follow the example of Bernard Legaut, a NLG pionner, who is both an active user and an active contributor of RosaeNLG. Bernard is currently using RosaeNLG for 4 projects automating the production of documents in French, and heavily contributing to the roadmap and the code of RosaeNLG.
- To use RosaeNLG, the starting point is the documentation.
- To contribute to RosaeNLG, check the main repo and the contribution guide.