How Multimodal AI Interfaces Will Revolutionize CX

The next evolution of human-machine interaction could bring about real, unprecedented synergy between humans and machines – with CX as the obvious beneficiary, writes Avaya’s Tvrtko Stosic.

Traditionally, people have interacted with technology via strict, well-defined commands used to complete narrowly specified tasks. In customer service cases, this resulted in traditional IVRs, through which customers gave commands via touchtone user interfaces to get at their desired information or to perform a certain transaction.

As much as a the CX industry has moved on, this traditional mode of interaction hasn’t changed all that much. We’ve seen impressive new technologies being deployed, but the way in which they’re employed has kept most workflows tethered to old-school technology interfaces.

The good news is that this may be about to change with advent of multimodal generative AI interfaces, which can not only understand inputs across different modalities and glean intent from them, but also generate content and outcomes across them, too.

Why is this useful? Let’s look at the problem, taking the use of conversational AI as an example. This technology promised a new paradigm through which customers could simply specify intent – without providing every single detail about how to reach a desired outcome – and the voice or chat bot solution will recognise that intent and complete the desired task. From a CX perspective, this means removing the need for complex navigation through IVR trees, which is obviously preferable.

However, due to its often rigid, scripted nature and its dependency on preconfigured scenarios, conversational AI alone hasn’t delivered on the expectations many had that it would reduce customer effort and improve experiences. In fact, in most cases, it’s become just another command-based technology, mirroring the high effort and bad practices associated with traditional IVRs.

The Next Step in Gen-AI

With the emergence of generative AI, we saw hope of dramatic improvements in CX with regards to self-service: Its spectacular conversational capabilities and ability to handle complex problems can finally humanise customers’ conversations with voice and chat bots. And it definitely solves the intent issue – if only across voice and text channels.

But what if we could reach the complexity of real human conversation – across many different levels and modalities? As well as verbally, people communicate with gestures, facial expressions, eye movements and more. Generative AI in its current form cannot match the complexity of this kind of communication, but with the advent of multimodal gen-AI, we might be getting close.

With multimodal LLMs, traditional voice and chat bots will evolve into new type of UI – multimodal customer service avatars. An AI avatar is a human-like virtual persona created using text-to-video generation. Avatars will be able to process not just voice or text but also images, gestures, facial expressions and eye movements. Going forward, additional capabilities will be added by leveraging haptic equipment and bio-sensory data. In addition to the ability to produce corresponding multimodal output, avatars will also be able to offer simultaneous two-way conversations – in contrast to the request-response-based interactions provided by traditional bots.

Multimodal avatars will dramatically outperform current bots in their understanding of customers’ intent and emotions, reducing ambiguity and offering hyper-personalisation – including the ability to empathize with the customer’s emotional state, not just through words and phrases, but also through facial expressions, posture, adjusting speech and more. And UIs powered by multimodal AI will not only see a revolution in customer-facing functions. We expect to see a similar impact in the area of AI-based agent assistants, which are already gaining popularity.

What’s really important here is that we are not talking about some distant future. Gartner expects that multimodal UIs will become a standard feature in virtual assistant within the next two years.

Putting Multimodal Gen-AI Into Practise

Despite these awesome capabilities, multimodal UIs will not represent a magical solution for everything – and solutions based on them will not be without their risks. For example, implementing a voice-based UI would discount it from being useful in crowded environments like airports or open-plan offices. What’s more, the power of multimodal generative AI could confuse customers by building an expectation of human-like performance from self-service solutions – for example, by showing critical thinking. Such non-viable expectations can easily lead to frustration, loss of trust and worse. Going further, more modalities for user inputs can expose more data sources. Of course, this can improve solution capabilities, but it also increases risks for data privacy.

To mitigate those challenges, companies should carefully select use-cases where multimodal UIs can bring obvious improvements to the customer experience. In some cases, traditional unimodal UIs may still have an advantage, so it’s worth considering the scenario. Companies should also be transparent about any use of multimodal AI and educate customers about its capabilities and limitations. And it goes without saying that a special focus should always be paid to security and data privacy.

And in fact, at a deeper strategy level, the most successful multimodal AI deployments could end up being based on a thorough understanding of human interaction. That involves leveraging deep knowledge on behavioral, cognitive, psychology and sociology sciences, which could prove to be more important than programming and other IT knowledge.

The evolution of human-machine interaction from command line interfaces to multimodal UIs could bring about real, unprecedented synergy between humans and machines – a low-effort relationship that offers human-like, empathetic and outcome-based interactions. If and when that comes to pass, CX is the obvious the beneficiary, and the result will be nothing short of a revolution.

About the Author

Tvrtko Stosic, Avaya Customer Experience Services consultant.

How Multimodal AI Interfaces Will Revolutionize CX

Related posts:

相关推荐

Share To :