|

[March 13, 2001]
Taking Your Business Mobile:
Voice-Enable Web Content
BY GARRY CHINN
Companies have invested billions of dollars in developing Web sites to
deliver content, products and services to customers. However, the Web only
reaches a small percentage of the world. The telephone, on the other hand,
is the most ubiquitous technology available, yet traditional telephony
solutions are costly, inflexible, and can be difficult to design and
deploy. With the growing demand for businesses to provide immediate and
simple access to information and services, it's no wonder that companies
are looking to take advantage of voice applications, which, with current
advancements, are beginning to be deployed using the Web.
Interactive voice response (IVR) systems, which have been around for
decades, are only realistic for use by the largest enterprises, which are
able to commit the time, investment, and resources to build the proprietary
voice applications. Only a handful of companies (such as Charles Schwab
and United Airlines) were progressive enough to pioneer new technologies such
as speech-driven IVR applications. As far as off-the-shelf voice
solutions, there have only been a limited number of applications -- such as auto attendants and voice mail systems
-- that have gained widespread
adoption by businesses and consumers.
The New Voice Web
Consider the standard methods of communication and information access
today -- the telephone and the Internet. The combination of these two
technologies gives companies a brand new resource for connecting with
customers: the voice Web.
The voice Web will extend the Web we know today by providing a new channel by which
customers can access and retrieve information. By leveraging the
infrastructure of Web-based content and applications, low cost, custom
voice applications can easily be built. Web applications -- written in XML
or HTML code -- can be transformed into telephony voice applications.
As businesses look into adopting voice technology, they must consider options that will grow easily and cost effectively with their business.
The voice Web will make it more economical to deploy voice applications,
allowing small- and medium-sized businesses to build and use state-of-the-art voice
solutions. Large businesses will also benefit; they will be able to
economically deploy more specialized applications targeted to segmented
customer bases.
The arrival of enhanced telephony devices such as smart phones and
wireless personal digital assistants (PDAs) will enable
"multi-modal" applications which handle both voice and data in
the same experience -- making next generation voice applications more
powerful than ever. While voice won't take over the mouse and keyboard
on desktop systems, for handheld devices, voice is an essential input
modality, making voice applications a core part of the mobile network.
Next generation handsets will be able to support a voice and a data
channel simultaneously, allowing true multi-modal browsing: voice and
keypad input combined with audio and visual output. Multi-modal browsing
will simplify navigation and information retrieval by replacing multiple
keystroke commands with spoken phrases, ultimately increasing the power
and effectiveness of telephony applications.
Voice browsing transforms Web applications into telephony voice
applications. Just as a Web browser renders the user interface on a PC,
a voice browser translates HTML or XML code into voice. A voice browsing
solution exploits the basic architecture of the Web, allowing content
developers to re-use as much of the existing components as possible, with little or no
modification to produce a low cost voice solution.
VoiceXML: A Scripting Language
VoiceXML is a scripting language based on the XML standard, which contains
the basic elements for constructing a voice-driven IVR application.
VoiceXML supports the creation of menu- or machine-directed dialogs that
guide users through an application by a series of menu prompts. It also provides basic transactional
elements -- a key to supporting telephony-based
commerce.
VoiceXML does have limitations, though. While VoiceXML is good for
building simple applications, it does not scale for building more complex
dialogs or transactional functions. Second, while well-designed static
content management applications are easily modified to support VoiceXML
(or any other markup language for that matter), dynamic content management
applications are built largely using programming code that is targeted for
HTML. Thus, Web applications and Web content will have to be re-written to
support dynamic VoiceXML applications. For some Web applications, the
dynamic content management functionality may be the most expensive
component to build and maintain. Finally, many Web applications use
client-side JavaScript to support more sophisticated transactional capabilities such
as validation tasks. The document object model (DOM) of HTML is different
from that of any other markup language. Consequently, client-side JavaScript
developed for HTML applications cannot be directly re-used even if a
VoiceXML platform supports JavaScript.
HTML For The Voice Web
The other more flexible and cost-effective approach to creating
voice-driven applications is using the HTML upon which these applications
are already based. Unlike a desktop browser, an HTML-ready voice browser
uses the DOM representation to generate a dialog interaction instead of a
visual layout. HTML is made for visual presentation and a good voice
experience cannot be generated from HTML alone. To customize and tune the
voice experience, either specialized tags or a separate voice style
language is used to supplement the existing HTML.
If the voice browser uses a style language, the content can be
separated from the presentation. This has a number of advantages.
Traditional IVR applications rely on machine-directed dialogs to
effectively walk a user through a hierarchical menu. Like VoiceXML voice
browsers, HTML voice browsers can generate directed dialogs. It is also
possible to overlay mixed initiative dialogs for navigation without
modifying the underlying HTML content. This navigation allows the user to
speak more natural phrases like "get me an IBM stock quote" to
bypass the step-by-step menu dialog interaction. These interactions can
also be built in VoiceXML, but the dialogs would not be automatically
generated from a DOM as in the case of an HTML voice browser. With
VoiceXML, a content developer has to manually program such capability into
the system, making it consequently more expensive to build and maintain.
Another advantage of the style language approach is that it allows the
content developer to leverage a greater portion of existing Web assets.
Dynamic content Web applications use programming code to generate HTML. An
HTML voice browser can make direct use of this HTML presentation layer
without re-writing programming elements. In practice, content developers may find that some tuning is
required to improve the voice experience.
Another important benefit of HTML voice browsers is the ability to
support existing client-side JavaScript. Client-side JavaScript in existing Web
applications is written for an HTML DOM. Since the original HTML is used
by an HTML voice browser, the original JavaScript is re-usable by
platforms with JavaScript support. For a VoiceXML voice browser, the
original client JavaScript could not simply be moved to the VoiceXML
document. It would have to be rewritten without referencing the HTML DOM.
Therefore, content developers looking to leverage a heavy investment in
transactional or dynamic content application development, should consider
looking at HTML voice browser platforms.
Platform Components For Enhancement
A voice Web application written in VoiceXML or HTML can be enhanced by
using platform components. To build even more powerful dialog and
transaction interactions, there are other platform technologies intended
for traditional IVR development. Such platforms combine libraries and
full-featured programming languages like Java and C++ to build complex
dialogs and transactional capabilities that could not be built using
either VoiceXML or HTML alone. For example, the Help capability of a
VoiceXML or HTML voice browser is not as customizable as one built
using a programming language like Java. The price of this power and
flexibility is the high cost of programming full applications. Both
VoiceXML and HTML support embeddable components through the object element
tag. By combining these technologies, content developers can invoke
platform capabilities to enhance VoiceXML or HTML functionality.
Another way to extend the functionality of a VoiceXML or HTML voice
browser platform is by adding scripts. JavaScript
is one popular choice among desktop browsers.
Conclusion
The arrival of the voice Web gives businesses an economical option to
build custom voice applications, which have considerable value as a
supplementary business channel. When building voice Web applications, it
is important to carefully consider requirements and choose platforms and
components that will maximize a business' existing investment in Web
applications. By doing so, companies will minimize the implementation and
maintenance costs of voice applications and gain a greater return on
investment.
Garry Chinn is Chief Technology Officer of VocalPoint
Technologies. VocalPoint Technologies provides middleware,
infrastructure and services for businesses to rapidly voice-enable HTML
and XML content, making it possible to access Internet and intranet
applications using natural speech over any phone. Its voice-based browser
allows businesses to build customized voice portals and services by
integrating VocalPoint's proprietary technology into their network
infrastructure, or by utilizing VocalPoint's fully outsourced ASP
(application service provider) solution. Incorporated in 1997, the company
has leveraged its speech technology research expertise to create
attractive voice-based solutions for businesses worldwide. |