ITEXPO begins in:   New Coverage :  Asterisk  |  Fax Software  |  SIP Phones  |  Small Cells

Feature Article
September 2001

Breaking The Sound Barrier: Why VoiceXML Will Revolutionize IVR


VoiceWeb. Voice Portals. VoiceXML. They are terms that have gone from flights of fancy just a few years ago, to hard-charging technology realities today. And as VoiceXML continues to move forward, it is poised to enter the marketplace with an audible bang. The implications of VoiceXML will impact the interactive voice response (IVR) industry profoundly.

Just as HTML revolutionized the Internet and powered the phenomenal growth of the Web, VoiceXML seems positioned to revolutionize IVR by laying the groundwork for the VoiceWeb. And while this revolution holds many promises, pitfalls remain.

The VoiceXML Advantage
VoiceXML removes application dependence on proprietary architectures. Voice applications in the IVR world have traditionally been developed based on either proprietary scripting languages or on proprietary APIs. As with any proprietary language or API, vendor implementation limitations often result from a lack of the broader industry perspective that has nurtured VoiceXML along its development process.

Additionally, as the service provider's corpus of vendor-specific application code grows, so too does the dependency on that vendor. In time, the cost to redevelop applications on a better platform can be prohibitive, restricting the service provider to a less price-competitive or feature-competitive solution.

However, with applications developed in VoiceXML, application portability is greatly enhanced, freeing the service provider to focus more directly on price/performance and OA&M feature comparisons when selecting a platform vendor.

Dependence on proprietary system-level hardware and software technology is reduced. By abstracting underlying interfaces to telephony and voice-processing hardware and software, VoiceXML frees the service provider to select compliant platforms that provide support for multiple options in "best-of-breed" component technologies like voice recognition and text-to-speech.

In a technological environment where fundamental improvements in voice recognition and text-to-speech (among other technologies) continue to be made, it is imperative to the service provider that hardware, software, and even system component vendor changes can be made with maximum transparency.

VoiceXML Makes Transparency Possible
A standardized language fuels competition among development tool vendors. As voice application development shifts to standardized VoiceXML, a marketplace for sophisticated VoiceXML-targeted integrated development environments (IDEs) will broaden.

Standardization of the markup language will ensure that applications developed with an IDE from one vendor interoperate with a VoiceXML platform from another -- ending the rigid lack of choice in the vertically integrated proprietary IVR model. The resulting marketplace competition will bring better IDE choices to the service provider.

Rather than recruiting, training, and retaining developers with narrow proprietary development skills, VoiceXML will empower service providers to select from the broader marketplace skill set of Web developers. As an added benefit, these developers will also be capable of deploying multi-modal (Web and voice) applications, while clients get the best of both worlds -- quicker application development with the same security and reliability.

As the W3C standardizes formal multi-modal markup languages, these same developer resources will be ideally positioned to leverage their existing skills in developing applications using these new languages.

Embracing Web development methodologies means VoiceXML builds from experience. Web infrastructure and development methodologies will be leveraged as VoiceXML-based services are deployed. Load-balancing, caching, middleware, and application servers that support existing Web-based services will become equally useful in supporting voice-based services as well. In fact, the VoiceXML architecture will do a great job supporting the n-tier development paradigm, meaning real reuse of business logic components between Web and voice applications may finally become a practical reality.

A marketplace for third-party voice application components is thus spawned. Leveraging standardized voice markup, third-party software developers will be able to write powerful and reusable server-side voice-application components for J2EE or .Net environments. Libraries of such components will greatly accelerate application development. Written in object-oriented languages like Java or C#, for example, these components could be easily customized or extended to more quickly create powerful applications.

More Flexible Service Models
Unlike its predecessors in the genre, the application deployment strategy of VoiceXML offers a much richer array of options. Traditional IVR service providers develop, host, and maintain voice applications for their clients. With VoiceXML, however, more options become available. A third party may develop the application. The client may elect to host (via HTTP) and directly control the source code (VoiceXML, middleware code, voice recordings, grammars, etc.), yet contract with a service provider to handle telephony and media services. With applications coded in VoiceXML, a client can much more easily choose to switch from insourcing to outsourcing or to move from one service provider to another. This is flexibility that meets the diverse demands of an increasingly sophisticated client base.

VoiceXML also enables development of a VoiceWeb. Although a convincing underlying business model that would support a pervasive VoiceWeb remains elusive, the potential for expanding the demand for voice services on a VoiceWeb is exciting. Whether a VoiceWeb is the ultimate outcome of VoiceXML's continued development and implementation remains to be seen, but the numerous other advantages of VoiceXML are, in any case, convincing.

Understanding The Risks
As with any new technology, there are risks attached to VoiceXML and some very important questions about its present and future remain unanswered. Despite the promise that VoiceXML holds for the IVR industry, the road to eventual success is not without hazards.

Will VoiceXML implementations become balkanized, disabling interoperability? Early implementations of VoiceXML have included many proprietary extensions. To be fair, many of these implementations preceded the VoiceXML 1.0 specification, but if VoiceXML implementations continue with such an "embrace and extend" philosophy, balkanization will occur and a meaningful VoiceWeb will be an improbability.

In reality, support for an entire constellation of markup languages will be necessary to ensure practical interoperability of voice applications among diverse platforms. Just as uneven browser support for HTML, ECMAScript, DHTML, DOM, etc., have been a problem for Web development, so too will uneven VoiceXML platform support for VoiceXML, SGML, SSML, CCML, etc., be for voice development. The industry needs to unite in its demand for adherence to standards. In fact, without standards adherence, VoiceXML will ultimately be a failure.

Gaps in the current VoiceXML 1.0 specification need to be closed. However, since the real impact of VoiceXML likely won't be felt until version 2.0 is implemented and service providers feel more comfortable with the technology, many of the version 1.0 gaps that are fixed in 2.0 won't really be a significant issue. Nevertheless, even with the 2.0 specification in place, more work remains.

Call handling is often cited as being a significant deficiency, although hopefully efforts like CCML will remedy this. Asynchronous event handling, however, remains as an example of an unclosed gap. As long as such gaps exist, the temptation will be present for diverse proprietary extensions to be made that hinder interoperability.

How Will Future Multi-Modal Standards Impact VoiceXML?
Completion of the first VoiceXML specification (DialogML or VoiceXML 2.0) to receive technical approval from the W3C Voice Browser Working Group is expected soon. Technology, however, is never satisfied with the status quo.

The W3C expects to soon form another working group to specify a multi-modal dialog markup language supporting both visual and verbal user interfaces. This work will be essential as 3G wireless services develop, but questions about how this proposed markup language will fit with VoiceXML are pertinent. For instance, will this new markup language essentially be a superset of VoiceXML and, if so, might VoiceXML gradually be replaced as the "lingua franca" of the VoiceWeb in the end?

What Does It All Mean?
It's clear that VoiceXML still faces significant growing pains on its way to becoming a viable, widespread solution. However, the appeal of the advantages of VoiceXML is broad and deep. Our industry has waited for the many benefits that VoiceXML now promises for a very long time. We think that our wait has been well rewarded. 

James Harvey has been Call Interactive's director of technical operations since 1998. He is responsible for managing the company's software and systems architecture, and heading up the Network and Systems Development group. Call Interactive, a subsidiary of First Data Corp., is a leading provider of automated customer care solutions and a member of the VoiceXML Forum.

[ Return To The September 2001 Table Of Contents ]

Today @ TMC
Upcoming Events
ITEXPO West 2012
October 2- 5, 2012
The Austin Convention Center
Austin, Texas
The World's Premier Managed Services and Cloud Computing Event
Click for Dates and Locations
Mobility Tech Conference & Expo
October 3- 5, 2012
The Austin Convention Center
Austin, Texas
Cloud Communications Summit
October 3- 5, 2012
The Austin Convention Center
Austin, Texas