Generative AI development is going cloud native and open source… How to make sense of it and what’s next?

Generative AI adoption patterns are quickly evolving. From the initial LLM curiosity right after the ChatGPT release, to consolidated enterprise use cases, justified (and sometimes unjustified) ROI exercises, and the beginning of internal LLMOps initiatives to track model performance and flows. At the same time, advanced architectures are relying more and more on cloud native development pieces (e.g. containerization, load balancing, monitoring), and new open source blocks (especially LLMs and vector databases) are taking a well-deserved place in the Generative AI scene.

But coming back to the title of this article… Why cloud native? Obviously it is about scalability, resilience, faster development and deployment, cost efficiency, flexibility, portability, security… classic cloud native arguments. Are they really related to Generative AI development? Well yes, every single benefit of cloud native is almost becoming a hard requirement for Generative AI. We all want our Gen AI developments to be secured, scalable, portable, and as cheap as possible. But what is cloud native really about? I highly recommend a “non-biased” option for you to learn about the cloud native ecosystem. This is, one of my new O’Reilly books, co-authored with my colleague Jorge Valenzuela Jiménez:

Kubernetes and Cloud Native Associate (KCNA) Study Guide (oreilly.com)

In this Kubernetes and Cloud Native Associate Study Guide you will certainly get all the background information about cloud native development, the Kubernetes project, and the amazing world the The Linux Foundation and the Cloud Native Computing Foundation (CNCF). You will also get the fundamentals of what cloud native means.

But how is this related to artificial intelligence, and to the new wave of Generative AI adoption? Notions such as model-as-a-service (MaaS) and managed Generative AI platforms rely on cloud native principles and technologies. Think about the first and probably the most illustrative Generative AI case: ChatGPT. How did a user-facing platform become scalable to support not only the demanding power of GPT models, but also the high volume of users around the world, connecting to the chat interface at the same time? Because its creator, OpenAI, relies on cloud native pieces such as Kubernetes, Prometheus, etcd, and gRPC.

If you want to understand the nature of cloud native architectures for Generative AI applications, I recommend you my other O’Reilly book:

Azure OpenAI for Cloud Native Applications (oreilly.com)

Even if the core technology for this book is the Azure OpenAI Service (Microsoft’s offering for enterprise GPT models), the Chapter 2 focuses on agnostic cloud native principles you can leverage for your own Generative AI developments with amazing open source LLMs such as Mistral or Meta’s Llama 3, for both on-prem and cloud-based environments. This and the rest of chapters will show you the interconnection between all building blocks, including of application development services, vector databases, and open source orchestration engines such as Semantic Kernel and LangChain.

Last but not least, and bringing the cloud native and AI discussion to The Linux Foundation and CNCF context, the recently created CNCF AI Working Group released a Cloud Native AI Whitepaper that explore these topics and complements my two previous recommendations:

Cloud Native Artificial Intelligence Whitepaper | CNCF

More concretely, you may want to take a look at the Figure 4 in page 21, as it contains all relevant open source solutions from the CNCF Project Landscape:

Now, what are the key streams that will guide the next stage of cloud native development for Generative AI? Here are some potential hints of what’s coming next:

The rise of open source LLMs → The ever-changing industry benchmarks such as LMSYS Chatbot Arena and others are showing promising signals of how open source models are progressively catching up, and in some cases leading the way (e.g. Mistral using MoE techniques before the rest of providers) for the new generation of LLMs. Proprietary models from OpenAI, Anthropic or Google are still leading the rankings, but we are quickly closing that gap. Also, Small Language Models (SML) such as Microsoft Phi-3, Google Gemma, or the smaller versions of Llama3 or Mistral are getting a lot of attention, because of their efficiency and potential use cases for local devices.
True multimodality → Language, code, and images were just the beginning. Audio, and especially video capabilities like OpenAI Sora are evolving in an exponential way, and the next generation of GPT5, Gemini, and similar models will incorporate true multimodality to handle all sorts of information with one single model. And this will enable new kinds of use cases that are inimaginable today.
Advanced AI agents → The notion of “agents” in Generative AI goes way beyond the LLMs. Concretely, AI Agents are autonomous entities that leverage those LLMs to automatically plan and connect with other systems to perform automated actions. Some preliminary examples are the plugins from ChatGPT or Microsoft Copilot, but the AI agency concept will certainly evolve in 2024.
More MaaS → One of the contributions of the main model providers is the notion of model-as-a-service. A fully managed deployment that allows developers to consume API endpoints and to minimize the burden of infra deployment. A complementary option to fully owned deployments of open source models, and a good way to start experimenting with low infrastructure requirements.
Cloud native landing zones → One of the signs of maturity within the organizations adopting Generative AI. Concretely, the ability to plan and implement architectures that combine different kinds of models, providers, subscription models, by leveraging hybrid and multi-cloud capabilities, load balancing, API management, etc. These architectures are the next frontier for adopter organizations.
Evolution of Responsible AI, security, and regulatory approaches → For both open source and proprietary Generative AI technologies, the level of requirements and detail of ethical and accountability considerations is really skyrocketing. You can get some context information from our previous article, and get updated insights in my upcoming AI_dev Europe 2024 session, where I’ll be sharing some recommendations for Generative AI transparency, and potential compliance specs for the EU AI Act (the new European regulation for Artificial Intelligence). If you want to learn more about these topics, please feel free to enroll in my free LinkedIn Learning class for Generative AI Compliance and Regulations, and to join the upcoming Global Challenge to Build Trust in the Age of Generative AI.

—

About the author: Adrian Gonzalez Sanchez is part of The Linux Foundation ecosystem, member of the Generative AI Commons – Responsible AI workstream. He is part of the Spanish AI Observatory OdiseIA, university lecturer at HEC Montreal (Canada) and IE University (Spain), and a Data & AI Specialist at Microsoft. He is also the author of the free Data and AI Fundamentals (LFS115x) course with The Linux Foundation and edX.

Generative AI development is going cloud native and open source… How to make sense of it and what’s next?

Previous PostThe Relaunch of the LF AI & Data Outreach Committee

Next PostSimplify Generative AI Model Development on Kubernetes with Datashim