It’s
Not Your Father’s IVR! The Big Gorilla Weighs In With Microsoft
Speech Server 2004
The big news at the co-located Microsoft Mobile DevCon/VSLive!/AVIOS-SpeechTEK
conference last month was Bill Gates’ launch of Microsoft’s surprisingly
low-priced and Web-friendly answer to the limitations of speech interfaces
for all forms of interactive, self-service tasks, Microsoft Speech Server
(MSS) platform. MSS is based on the long-talked about SALT (speech
application language tags) language, rather than the VoiceXML standard for
integrating text and speech Web application interfaces. Microsoft claims
that SALT will simplify the development of interactive applications that
share common application code, while supporting the different voice/visual
interface requirements of mobile, multi-modal user devices.
Like other
disruptive technologies, this announcement will cause enterprise
organizations to think twice about migrating from their legacy
speech-oriented application tools to accommodate converged application
interfaces. Not only do we see this announcement as breaking price barriers
for speech-enabled applications in the relatively “greenfields” SMB market,
but its practical exploitation of multimodal SALT will help push the whole
market toward multi-modal user interfaces for many mobile online
applications.
YOUR
FATHER'S IVR
Interactive Voice Response (IVR) has long been the bastion of
telephone-based applications, primarily to support call center activities
with “front-ends” to identify a traditional telephone caller for selectively
routing the call and generating “screen-pop” information to a live agent,
and, perhaps more importantly, to provide self-service applications via a
Telephone User Interface (TUI). The notorious TUI IVR speech menus (“press
one for…”) left much to be desired in terms of flexibility and time
efficiency, and the proprietary platforms, speech cards, and complex design
tools made IVR an expensive proposition for enterprise application
implementation and ongoing maintenance. That’s why it found some degree of
success primarily in conjunction with larger enterprise call centers.
When it came
to non-customer callers, the limited facilities of voicemail systems were
exploited to interact with the caller as a simple answering machine for
caller voice messaging, an auto-attendant to re-direct calls from the main
business number to specific user extensions, and some really mickey-mouse
ways to use a group of special mailboxes to emulate an application call flow
with voice menus (mailbox greetings) and branching logic to other mailboxes.
What voicemail could not do is directly access application databases; that
required the power of IVR programming.
Your
father’s IVR also had problems with creating the speech prompts and
responses, because it originally required laborious pre-recording with voice
artists. God forbid a small script change was needed and the original person
who did the recording was no longer available! Although having mixed voices
is not a terrible thing, it could be “unnatural” and disconcerting to a
caller.
BETTER
SPEECH TECHNOLOGIES TO THE RESCUE
Everyone acknowledges the fact that speech recognition and
text-to-speech have now become mature and cost effective enough for
practical use in controlling applications and informational content. This
applies to both person-to-person “communication applications” (voice calls,
messaging) and service applications, where speech is used for application
input and/or output to end user contact devices, such as:
- Desktop voice-only telephones
- Handheld wireless phones
- Multi-modal, handheld devices
But, let’s
get realistic about the value of speech as an interface medium! Speech
interfaces make sense for mobile use to replace a large screen and keyboard
and for bite-size pieces of information like messages and information
alerts. They're not useful for scanning documents or digging around
databases. The value of speech control interfaces at the desktop, such as a
PC-based softphone, may be somewhat limited because there are faster ways of
interaction than with speech output (screen displays) or where informational
privacy must be maximized through non-audible input or output. Finally,
speech will be almost useless in a really noisy environment.
It is only recently, however, that the
benefits of improved speech recognition have been brought to market by
leading technology providers, such as ready-to-use Avaya’s Speech Access
product, which can speech-enable its other communication applications
software. Microsoft is aiming to exploit those kinds of benefits further
with an integrated platform speech-enabling any kind of online Web
application by third-party developers.
Part 2
Copyright © 2004 The
Unified-View, All Rights Reserved Worldwide
|