SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




The Importance of Emotion in VUI Design

TMCnews


TMCnews Featured Article


December 15, 2008

The Importance of Emotion in VUI Design

By Stefania Viscusi, Assignment Desk Editor


While most companies today are looking to ways they can cut costs and still remain productive, the call center industry is particularly in need of automation to help not only reduce costs, but improve their customer relationships as well.
 
For customers calling into the contact center, the main goal is to have their needs met quickly and effectively. While this kind of service can be costly and rare to come across, automating some of the tasks so that both users' needs are met is necessary.

 
Voice User Interface (VUI) technologies achieve this goal by using speech technologies to automate phone handling. In order to be successful, these vocal interactions must be as human-like as possible and provide a natural human-to-computer interaction.
 
To find out more about VUI design and its importance, I took some time to speak with Loquendo's (News - Alert) Sheyla Mitello about the topic.
 
 
How do the goals of contact center solution providers differ from those of callers/ end users when it comes to VUI Design?
 
The goals differ because service providers are willing to provide lots of information about their products and services and cover different topics, while callers are aiming to easily and directly reach an answer to one specific topic of interest. In VUIs, reading long lists of options is time-consuming, frustrating, and ineffective. Achieving a balance between these sometimes-conflicting needs is part of the design process. Probably the biggest challenge is keeping the person cued about what he or she can do next. In VUIs, this means knowing what you can say in response to a voice prompt; the other big challenge is presenting information in small enough chunks that people can absorb it and use it.
 
Consumers are becoming more demanding when searching for products and services to meet their needs, and in response to this, products and services are increasing in quantity and complexity each day. As a result, successful companies are increasing the use of self-service channels to remain competitive in the market.
 
 
Is there a middle ground that meets both their needs?
 
While consumers are the main source of revenue for a company, it can be challenging to keep contact with them and assure their loyalty.
 
For most customers, interactions that deal with other humans are still needed.
 
Customers who are less "technologically oriented" can become unsatisfied when using self-service tools and applications, while customers that are more "technology oriented" are increasingly  becoming  indifferent by traditional means of contact (banner, mail, etc.), so it is of the outmost importance moving toward newer means of contact.

To be successful, companies should improve the self-service interactivity experience and focus not only on the VUI element.
 
In principle, the VUI was an important element: a voice service enables callers to access information using nothing but their voices. Speech-based user interfaces must be easy for users to control and speech recognition interfaces need to meet complex and important requirements, including understanding the basic laws of communication, eliminating the need for training, and responding appropriately to requests.
 
By implementing a conversational VUI, service providers can expect a greater acceptance of voice based products and services because users can naturally and immediately use them. Nowadays, other differentiating elements are also needed, and most related to emotional design.
 
 
Can you talk briefly about the Market Validation findings on "Vocal Browsing"?

That EU Market Validation project was less focused on emotional traits in human computer interaction, and dealt mostly with investigating the more attractive automated and voice enabled applications. It did however suggest some interesting findings.
 
There were three main motivations for user's mass adoption of voice browsing service:

1. The necessity: they respond, in conditions of mobility, to important needs where they do not have competitors;
2. The exception: there are rare situations in which the demand for this information is immediate (i.e. confirmation or changing of strikes for the public transports). Normally, the competitor’s channels are less expensive and the persons who have need for a mass consumption of this information for their job tend to use other channels;
3. The pleasure of the sharing: the motivations of use are various: users do not ask to themselves how much the information is useful to them, but the use is driven by the pleasure.
 
It is  by exploiting the “pleasure” as a driver in adoption of automated services, that more recent realizations have made the wider use of a human - computer interaction approach more life-like.
Several Human Digital Assistant (AKA Virtual Agents or simply Avatars) have been developed, e.g. by companies such as H-Care (www.h-care.it) using their animation technology, which enables the eyes and face to move in a truly realistic way and fully synched-in with the voice, alongside with our TTS, which gives fluency and an emotional range.
 
 
What did the project conclude about the use of emotions in speech solutions?
 
Investigations of human-computer interaction in voice services revealed that there was a tension between the relative simplicity but arbitrary character of the standard menu style and the potential usability benefits to be derived from the systematic deployment of suitable metaphors for the service interface. Such metaphors greatly benefit from the adoption of emotional cues. Initially, acoustic cues enabled users to make use of one of the major input modalities that the visually impaired have (auditive channel). But nice sounds alone in the space do not solve the complexity of a usable and friendly information system. We proposed a special hypermedia model, in which each node of information has some special feature that reflects the kind of information that it contains. Each type of node was mapped to a certain speaker. The voice has been generated simulating different real persons and was enriched with emotionally auditory cues. Each one gave a special information point-of-view and - in some way - assigned an anthropomorphic characteristic to the information content.

In more complex interface, users can be supported by the use of a different speaker. In order to carry out task control, there can be a special speaker, called the assistant, who stands all the time in a fixed position related to the user's position. This assistant's function covers tasks such as backtracking and orientation of the user by means of context dependent advice. In this way control task and information content is presented homogeneously when the user interacts with different persons.
 
What role do avatars and digital assistants play in the use of emotion? How important is it for them to have emotional elements?
 
The use of Virtual Agents has revolutionized the customer's online experience by providing a range of customer care solutions for diverse sectors across the globe.
 
With avatar and digital assistants, a virtual guide simplifies and enriches the customer's online experience by focusing their attention on targeted products and information and allowing informative kiosks to be transformed into a truly interactive experience. These solutions also allow the creation of truly personalized interactions between the avatar and the targeted customer and create customer care solutions that are informative, expressive and fun to use.
 
What are some of the latest developments being used to improve VUI design and that make human - computer interaction even more life-like?

A conversational VUI that follows person-to-person communication guidelines is the most critical element of a voice service. Successful systems will allow users to interact with it in a natural and intuitive way that mimics the flow of ordinary conversation. When humanizing virtual agents, the role of speech technology in effective human-machine interaction is especially important, not only since more efficient algorithms improve voice quality and minimize the need for fine tuning in the application design, but also in the expressiveness features.
 
Expressive TTS uses sophisticated algorithms for signal manipulation that, based on prosodic models for different expressive styles (angry, happy, sad, etc.), is able to generate expressive speech.
 
Another frontier, Multimodality, improves usability by exploiting different modalities for interacting with (small) devices; and accessibility, providing extended access to an application or a service.
 
Speech accompanied by other modalities, provides a richer and more robust way of communicating. For instance, voice with gestures and facial expressions can help the comprehension of services. All this can lead to more friendly applications (IVR, gaming, virtual operator, etc.) and bring benefit to industries beyond the contact center.
 
 
For more, be sure to check out the Speech Recognition and Text to Speech channel on TMCnet.
 
 

Stefania Viscusi is an assignment editor for TMCnet, covering VoIP, CRM, call center and wireless technologies. To read more of Stefania’s articles, please visit her columnist page.

Edited by Stefania Viscusi







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy