By Deborah Dahl
Conversational assistants are becoming ubiquitous, from advanced AI systems like ChatGPT to everyday tools like Alexa. They all share the ability to understand and respond to natural language queries. But what if they could seamlessly hand off conversations to each other based on expertise? That’s the aim of the Open Voice Interoperability Initiative. By developing simple, versatile specifications, we’re enabling assistants to transfer conversations, regardless of the assistant’s internal structures.
Traditional systems share a common trait: they’re largely self-contained. Once users engage with a conversational assistant, they’re limited to it. If the assistant can’t assist, users must seek answers elsewhere, often resorting to another web search. But what if assistants could seamlessly pass conversations to other assistants with different expertise? The Open Voice Interoperability Initiative is crafting specifications to enable such transfers. We’re striving for simplicity, broad support across various assistants, and neutrality regarding their internal structures.
Interoperable conversational assistants hold great promise for large organizations like governments, businesses, or universities offering diverse services through AI. Governments, for instance, provide citizens with a wide array of services, from national parks to taxes and transportation. It’d be incredibly convenient if users didn’t have to navigate multiple websites for these inquiries, but rather, one assistant could seamlessly transfer them to others as needed.
Another advantage of a common communication system among assistants is that In most cases, conversational assistants within an organization are developed over a period of time by different teams using different platforms. This can be a very long period of time and the technology can change dramatically over the years. For example, the Amtrak Julie AI assistant is over twenty years old, and you can imagine how much conversational assistant technology has changed since then. Adding a more sophisticated assistant to the Amtrak family shouldn’t have to mean that Julie has to be replaced. This observation leads to an important feature of the Open Voice Interoperability specifications–they are neutral with respect to the underlying technologies of the assistants. Whether they are LLMs, GenAI, traditional rule-based systems, or proprietary platforms—they’re just specifications about communication between assistants. The assistants should be able to use whatever internal technologies meet their requirements, but externally, the assistants communicate with each other using a common set of Open Voice Interoperability inter-agent messages.
Several specifications have been published, but the most important one is the “Conversation Envelope”. This is a message format that wraps all the communications between assistants. The Conversation Envelope establishes the connection between two assistants and also contains several different kinds of events that convey specific kinds of information between assistants. These events include inviting an assistant to take over a conversation, sending user utterances to another assistant that is more capable of processing them, and asking an assistant to provide a description of its capabilities.
We’ll discuss the full set of specifications in more detail in future blog posts. It’s important to note that the specifications are under active development and will likely be refined as they are more widely implemented and tested. However, further development is expected to be backward compatible with earlier specifications, so there is no need to wait for a final version before the advantages can be realized.
We invite anyone who is interested in conversational AI to take a look at our GitHub repositories, try out the specifications, and join us in moving the specifications forward.