SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




Loquendo Q&A: From One Speech Luminary to Another

TMCnews


TMCnews Featured Article


October 21, 2009

Loquendo Q&A: From One Speech Luminary to Another

By TMCnet Special Guest


Speechcycle’s Chief Technology Officer, Roberto Pieraccini, who was awarded Speech Technology’s Speech Luminary 2008 talks with 2009 award recipient Paolo Baggia, Loquendo’s (News - Alert) director of international standards, about the current state of the speech market, next year’s event, and how to change people’s perception of speech technology, in this exclusive interview.

 
 
 
RP: Congratulations on being named Speech Luminary 2009, Paolo! Were you surprised to win the award?
 
PB: Well yes, I was. It was quite an honor. I saw the award as an acknowledgement of my involvement with the W3C (News - Alert) over the last few years, and my longstanding presence in research and in the deployment of speech technologies along with my various academic commitments. And you, Roberto, I hear you were given an academic award just recently.
 
RP: Yes, I was very happy to be elected ISCA Fellow at the INTERSPEECH 2009 conference in Brighton, U.K. It is a great privilege. I actively try to create links between the academic conferences and the more business-oriented ones, and this year I was very pleased to see that there were more people from the academic world in New York City for SpeechTEK 2009 and more people from the industry at INTERSPEECH. I hope that this exchange of ideas between the two communities will continue to expand and spread, because there is much to learn from one another.

PB: I strongly support your attempts to bring the two worlds together, and in a similar way, I’m working at promoting speech standards within universities to encourage them to leverage existing and highly effective means of integration, such as VoiceXML (News - Alert) and other related standards developed by the W3C and IETF.
 
RP: Given that Loquendo is based in Europe, what are your feelings on the upcoming SpeechTEK Europe 2010? Do you think it will be a success?

PB: I hope so. Loquendo is involved with the committee for setting up the conference, and I think it’s a great idea to export a large speech conference over to Europe, where existing events are generally fairly small-scale and tend to be more focused on the technological issues and R&D than on commercial needs. I think a large event will put speech in the spotlight and hopefully convince the reluctant European markets of its value.
 
RP: I think in the U.S., we tend to think that Europe has fallen behind when it comes to the adoption of speech technologies. In North America, speech-enabled apps are now very main stream and there is a huge variety of them in use. I read recentlythatthe U.K. is about a year behind the U.S. in terms of the take-up of speech, with other European countries even further behind. And yet wasn’t Europe ahead of the U.S. not very long ago?
 
 
PB: Yes, you’re right. In telecommunications, for instance, 2G and then 3G mobile penetration started off stronger in Europe than in U.S. and, in the nineties, the diffusion of speech applications was relatively rapid. But then adoption slowed down, and now we have been left a little behind.
 
RP: Why do you think that is?
 
PB: I think there are many socio-economic reasons. For starters, there hasn’t been a sufficient level of investment in improving the quality of speech-enabled services, and therefore those that exist are not always as advanced as they might be.
 
Businesses are also not often motivated to invest in speech, perhaps because of long-held European perceptions about the limitations of the technology. Because Europe was an early adopter of speech, it could be argued that both consumers and businesses over here have outdated ideas about the quality of speech recognition and speech synthesis.

RP: Yes, exactly, we need to get that message out. Speech-enabled contact centers, for example, can actually provide a superior service compared to exclusively human-operated services. In the U.S., there has been a lot of work, and SpeechCycle (News - Alert) is closely involved with this, to demonstrate to customers that automated services complement human operators.
 
SpeechCycle’s aim is to provide an equal or better caller experience compared to human operators. This is actually possible today when sophisticated platforms that embed the most advanced speech science, VUIs, and Web 2.0 concepts are used, like for instance SpeechCycle’s RPA (Rich Phone Apps) and Loquendo’s VoxNauta (its VoiceXML/CCXML Platform).
 
There has also been a great deal of negative press about spoken dialogue systems. The reality, however, is different from people’s perceptions since modern IVR actually helps  solve problems over the phone in just a few minutes instead of waiting at least 20 minutes just to get through to a human operator.
 
PB: What do you think of initiatives like GetHuman.com? Are they helping, or only creating obstacles?
 
RP: I think GetHuman did a good job in spreading the word about which applications were really performing badly. Applications that lock you in and don’t give you a way out when the voice interface fails have been poorly designed and are not giving customers enough choices.
 
On the other hand, GetHuman also contributed to the negative view that the general public has always had towards speech applications.
 
PB: I think the same is true in Europe. There are few attempts to talk about speech technology in an educational context, whereas on the news or in the media in general I often see people discussing and getting excited about other forms of technology. Speech, however, is generally left out of the discussion.
 
RP: We were talking about slow adoption a moment ago, but I think it’s interesting to remember that telephone switching was once all done by hand, until automated switching was invented. And the catalyst for that was lack of manpower – AT&T calculated that there simply weren’t going to be enough people to do the job, not in the whole U.S. Also, take the ATM machine, for example. When these devices first came out, people didn’t really trust them, but now more or less everybody prefers to get cash from a machine than over the counter in the bank. People understand the advantages of ATMs, but also they accept their limitations. This is not the case with speech. We need to help people to overcome their mistrust.
 
PB: Perhaps it is the continual comparison with human operators, the eternal search for natural, life-like speech, which creates problems for speech technologies? We shouldn’t create expectations which can not be fulfilled.
 
RP: Yes, I think you’re right. No one’s suggesting we abandon the goal of human-like speech, but we should make more use of speech for performing simple, repetitive tasks to a really high standard, so avoiding unfavorable comparisons with human operators.
Do you think that the advent of small, handheld devices with a screen will help speech applications to improve their image?
 
PB: Yes, speech can greatly complement other modalities. However, I don’t see much common ground or shared features across different speech apps. In the GUI approach, for example, the interface was simplified by means of double-clicking, icons, shortcut keys, etc. – and these features work on a huge range of different devices, which all share the same paradigm. New devices like the iPhone (News - Alert), on the other hand, are further extending the GUI by introducing touch to enlarge items or move things around the screen. I don’t see an emerging common paradigm for speech applications, however.
 
RP: Both ETSI and ITU tried to standardize voice commands, but they were not very successful.
 
PB: We must work harder to update perceptions. When people think of speech technology, they generally have in mind bad contact center encounters from the past. Of course, when people call a helpline it’s generally because of a problem of some kind – a broken modem, being overcharged, etc. - so they’re already irritable even before they speak to an operator - virtual or otherwise - and have very low tolerance for any limitations the technology may have.
 
The truth is their experiences were already negative even before they picked up the phone and dialed the number. Unfortunately, such impressions continue to reflect badly on speech technology, and they’re hard to shift.
 
RP: Yes, and the more speech is found on smartphones and sat-navs, for example, and in the entertainment sphere in general, the more we will manage to shift some of those negative perceptions. And speech works very well in that context because it really complements other modalities like touch, small screens, etc. I like to think of speech as a situational ingredient – an excellent means of interaction when used in the right place and at the right time. Interacting by voice might not work so well when you’re in a very noisy restaurant, for example, but it’s ideal for clearing out your inbox when you’re at the wheel.
 
For more, check out the Speech Recognition and Text to Speech channel on TMCnet.

TMCnet publishes expert commentary on various telecommunications, IT, call center, CRM and other technology-related topics. Are you an expert in one of these fields, and interested in having your perspective published on a site that gets several million unique visitors each month? Get in touch.

Edited by Stefania Viscusi







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy