TMCnet - World's Largest Communications and Technology Community




Arthur M. Rosenberg

[April 13, 2004]


By Art Rosenberg

It’s Not Your Father’s IVR! The Big Gorilla Weighs In With Microsoft Speech Server 2004

The big news at the co-located Microsoft Mobile DevCon/VSLive!/AVIOS-SpeechTEK conference last month was Bill Gates’ launch of Microsoft’s surprisingly low-priced and Web-friendly answer to the limitations of speech interfaces for all forms of interactive, self-service tasks, Microsoft Speech Server (MSS) platform. MSS is based on the long-talked about SALT (speech application language tags) language, rather than the VoiceXML standard for integrating text and speech Web application interfaces. Microsoft claims that SALT will simplify the development of interactive applications that share common application code, while supporting the different voice/visual interface requirements of mobile, multi-modal user devices. 

Like other disruptive technologies, this announcement will cause enterprise organizations to think twice about migrating from their legacy speech-oriented application tools to accommodate converged application interfaces. Not only do we see this announcement as breaking price barriers for speech-enabled applications in the relatively “greenfields” SMB market, but its practical exploitation of multimodal SALT will help push the whole market toward multi-modal user interfaces for many mobile online applications.

Interactive Voice Response (IVR) has long been the bastion of telephone-based applications, primarily to support call center activities with “front-ends” to identify a traditional telephone caller for selectively routing the call and generating “screen-pop” information to a live agent, and, perhaps more importantly, to provide self-service applications via a Telephone User Interface (TUI). The notorious TUI IVR speech menus (“press one for…”) left much to be desired in terms of flexibility and time efficiency, and the proprietary platforms, speech cards, and complex design tools made IVR an expensive proposition for enterprise application implementation and ongoing maintenance. That’s why it found some degree of success primarily in conjunction with larger enterprise call centers.

When it came to non-customer callers, the limited facilities of voicemail systems were exploited to interact with the caller as a simple answering machine for caller voice messaging, an auto-attendant to re-direct calls from the main business number to specific user extensions, and some really mickey-mouse ways to use a group of special mailboxes to emulate an application call flow with voice menus (mailbox greetings) and branching logic to other mailboxes. What voicemail could not do is directly access application databases; that required the power of IVR programming.

Your father’s IVR also had problems with creating the speech prompts and responses, because it originally required laborious pre-recording with voice artists. God forbid a small script change was needed and the original person who did the recording was no longer available! Although having mixed voices is not a terrible thing, it could be “unnatural” and disconcerting to a caller.

Everyone acknowledges the fact that speech recognition and text-to-speech have now become mature and cost effective enough for practical use in controlling applications and informational content. This applies to both person-to-person “communication applications” (voice calls, messaging) and service applications, where speech is used for application input and/or output to end user contact devices, such as:

  • Desktop voice-only telephones
  • Handheld wireless phones
  • Multi-modal, handheld devices

But, let’s get realistic about the value of speech as an interface medium! Speech interfaces make sense for mobile use to replace a large screen and keyboard and for bite-size pieces of information like messages and information alerts. They're not useful for scanning documents or digging around databases. The value of speech control interfaces at the desktop, such as a PC-based softphone, may be somewhat limited because there are faster ways of interaction than with speech output (screen displays) or where informational privacy must be maximized through non-audible input or output. Finally, speech will be almost useless in a really noisy environment.

It is only recently, however, that the benefits of improved speech recognition have been brought to market by leading technology providers, such as ready-to-use Avaya’s Speech Access product, which can speech-enable its other communication applications software. Microsoft is aiming to exploit those kinds of benefits further with an integrated platform speech-enabling any kind of online Web application by third-party developers.

Part 2

Copyright © 2004 The Unified-View, All Rights Reserved Worldwide

Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].


© 2023 Technology Marketing Corporation. All rights reserved | Privacy Policy