Blogs:
Rich Tehrani
Tom Keating
Al Bredenberg
Michelle Pasquerello
Greg Galitzine
Call Center/CRM
...more
 

Call Center/CRM Management Scope
March 2003


Beyond SALT Versus VoiceXML: Coping With The Wealth Of Standards In Speech And Multimodal Self-Service Applications

By K. W. (Bill) Scholz, Ph.D., Unisys Corp.

Standards serve as the foundation for growth within an industry. As a new technology is spawned and begins to pique the interest of developers and consumers, its initial growth is typically haphazard and devoid of structure. As the technology reaches adolescence, however, its leaders develop standards that guide growth and interoperability, and its haphazard evolution fades. As the technologies enabling speech and multimodal self-service applications mature, many standards have emerged and combined to enable the field to approach mainstream status. The growth of standards is not without its cost, however; because of the complexity of the underlying technologies, the standards documents themselves have grown to span thousands of pages, and as a consequence constitute an overwhelming obstacle to a developer's mastery of the technology.

Furthermore, this past year has seen considerable press devoted to the so-called 'conflict' between the two key standards in our industry: SALT and VoiceXML (VXML). Claims of conflict have deluded some developers into feeling pressure to make premature 'choices' between them, while intimidating others into inactivity as they wait for the industry to choose the 'right' one. In fact, there are over a dozen distinct standards designed to guide the development and execution of speech and multimodal applications, occasionally competing with one another but more frequently operating in harmony to guide distinct components of the application's architecture. 

Deployment Architecture
Figure 1 illustrates the deployment architecture for a speech or multimodal application. The major components in the architecture and their functions are as follows:
Application Server. The central component is the application server, the platform and software responsible for managing the execution of the application. The application server's principal responsibilities include management of the dialog with the end-user and management of the business transaction processor, the application's business functionality.

Business transaction processor. This term describes the software and (optionally) the platform responsible for execution of the business transactions (for example, a travel reservation system, a retail banking database, a regional or national weather repository, or a securities transaction database, to name a few).

Voice gateway. During execution, the application server interchanges information with the voice gateway that is coded in a markup language and is conveyed using the familiar Internet delivery paradigm. The voice gateway includes:
' A markup language interpreter,
' An automatic speech recognizer (ASR), 
' A text-to-speech (TTS) generator, and 
' A telephone network interface (tele interface). The tele interface mediates the connection through the circuit-switched or packet-switched telephone network to the end user. The network connection will use either a direct digital interface to the circuit-switched network or voice-over-IP (VoIP) through a media gateway to the telephone network. 

Voice user interface. This is an end user interface using speech over wireless or wireline telephones.

Graphics user interface. This is an end user interface using desktop PCs, PDAs, cell phones with digital visual displays, or other screen-oriented devices.

Figure 1

Standards
The principal standards and standardized APIs (application program interfaces) that guide the operation and interaction of the components in the architecture are shown in Figure 1, and are listed and described below. The agency responsible for each standard or API is shown in parentheses after the standard's name.

CCXML (W3C). Call Control eXtensible Markup Language is designed to provide telephony call control support for dialog systems. CCXML is intended to serve as an adjunct language for use with a VXML, SALT or other dialog implementation platform.

HTTP (IETF). Hypertext Transfer Protocol is an application-level protocol for distributed, collaborative, hypermedia information systems. It is a generic, stateless protocol which can be used for many tasks beyond its use for hypertext, such as name servers and distributed object management systems, through extension of its request methods, error codes and headers.

H.323 (ITU). H.323 is a standard that specifies the components, protocols and procedures that provide multimedia communication services ' real-time audio, video and data communications ' over packet networks, including Internet protocol (IP)'based networks. H.323 is part of a family of recommendations that provide multimedia communication services over a variety of networks.

JDBC (Sun Microsystems). Java Database Connectivity is an API that lets developers access virtually any tabular data source from the Java programming language. It provides cross-DBMS connectivity to a wide range of SQL databases and, with the JDBC API, it also provides access to other tabular data sources, such as spreadsheets or flat files.

ODBC (Microsoft). Online Database Connectivity is a widely accepted API for database access. It is based on the Call-Level Interface (CLI) specifications from X/Open and ISO/IEC for database APIs and uses Structured Query Language (SQL) as its database access language.

SALT (W3C). Speech Application Language Tags is a platform-independent standard that makes possible multimodal and telephony-enabled access to information, applications and Web services from PCs, telephones, tablet PCs and wireless PDAs (personal digital assistants). The standard extends existing mark-up languages such as HTML, XHTML and XML.

SIP, RTP, MGCP (IETF). SIP (Session Initiation Protocol) is a signaling protocol for Internet conferencing, telephony, presence, events notification and instant messaging. RTP (Real-time Transport Protocol) is a protocol for the transport of real-time data, including audio and video. MGCP/MEGACO (Media Gateway Control Protocol) addresses the relationship between the media gateway, which converts circuit-switched voice to packet-based traffic, and the media gateway controller (sometimes called a softswitch), which dictates the service logic of that traffic. 

SRGS (W3C). Speech Recognition Grammar Specification defines the syntax for grammar representation intended for use by speech recognizers and other grammar processors so that developers can specify the words and patterns of words to be listened for by a speech recognizer.

SSML (W3C). Speech Synthesis Markup Language is a rich, XML-based markup language for assisting the generation of synthetic speech in Web and other applications. Its essential role is to give authors of synthesizable content a standard way to control aspects of speech output such as pronunciation, volume, pitch, rate, etc., across different synthesis-capable platforms.

SS7/ISUP (IETF). Signaling System 7 is an architecture for performing out-of-band signaling in support of the call-establishment, billing, routing and information-exchange functions of the PSTN (public switched telephone network). It identifies functions to be performed by a signaling-system network and a protocol to enable their performance. ISUP (ISDN User Part) defines the messages and protocol used in the establishment and tear down of voice and data calls over the PSTN, and to manage the trunk network on which they rely. 

VoiceXML (W3C). VoiceXML (Voice eXtensible Markup Language) is designed for creating audio dialogs that feature synthesized speech, digitized audio, recognition of spoken and DTMF key input, recording of spoken input, telephony and mixed-initiative conversations. Its major goal is to bring the advantages of Web-based development and content delivery to interactive voice response applications.

WAP / WML (OMA). Wireless Application Protocol and Wireless Markup Language refer to a markup language based on XML which is intended for use in specifying content and user interface for narrow band devices, including cellular phones and pagers.

XHTML (W3C). eXtended HyperText Markup Language is a family of current and future document types and modules that reproduce, subset and extend HTML 4. The XHTML document types are XML-based and ultimately are designed to work in conjunction with XML-based user agents.

XML (W3C). eXtensible Markup Language is a simple, very flexible text format derived from SGML (Standard Generalized Markup Language). Originally designed to meet the challenges of large-scale electronic publishing, XML is also playing an increasingly important role in the exchange of a wide variety of data on the Web and elsewhere.

X+V (W3C). XHTML + Voice brings spoken interaction to standard Web content by integrating a set of mature Web technologies such as XHTML and XML Events with XML vocabularies developed as part of the W3C Speech Interface Framework. The profile includes voice modules that support speech synthesis, speech dialogs, command and control, speech grammars and the ability to attach voice event handlers.

Application Creation
It is clear that if an application developer were required to attend specifically to the details of every standard during the development process, application creation would become prohibitively complex. Yet it is equally clear that the evolution of standards plays a vital role in facilitating inter-vendor operability and modularization, and has become the lifeblood of growth in our industry. The solution to this problem is found in today's collection of development tool suites and service creation environments. In recent years, these have grown in sophistication to the point that the developer is shielded from the intricacies of standards conformity or enforcement, yet can derive the full benefit of standard conformance. 

The retail shelves are lined with a collection of sophisticated tool suites and SCEs (Service Creation Environments) designed to address these problems. Developers produce speech and multimodal applications using a selected subset of these tools. Figure 2 illustrates how one can combine a carefully selected subset of these tools and packaged application delivery components to shield the developer from the need to explicitly master each of the standards inherent in the architecture. 

Figure 2

Application Development And Deployment Using The 'Right' Tools
The following description summarizes our application development process with special emphasis on the tools and delivery components used in each phase, and how standards are addressed without the need for specific focus on each.

Planning and discovery. The development process starts with 'planning and discovery' where the project management team interviews the customer to analyze the problem in detail to identify the application's purpose and methodology. 

Dialog design and evaluation. A user interface layout tool is used to express the applications methodology as an ordered collection of dialog 'states,' where each state includes a prompt, expected responses to the prompt and lists of actions associated with each response. The same tool manages testing where the application's execution is simulated for candidate end users using operator-guided call flow. 

Grammar and prompt design. Once dialog design is completed, evaluated and modified as required, the detailed grammars are entered using a feature that employs a spreadsheet metaphor to refine the responses in each dialog state by entering anticipated words and phrases. Additionally a prompt design tool is used to structure the verbal output for each dialog state to use any mixture of recordings and synthesized speech. 

Business transaction integration. Integration with the business transaction process is performed by building a 'connector.' The tool supports creation of connectors to databases, legacy mainframe applications, and to any Web-based resource or site. Output from a connector consists of an XML or XHTML stream which is integrated into the code using J2EE conventions. 

Voice gateway integration. A voice gateway is selected which best meets a customer's needs, and the runtime engine is conditioned to produce the markup language stream (VoiceXML or SALT) appropriate to the selected platform. Voice gateway provider's tools are used to integrate the gateway into customer-specific circuit-switched or packet-switched networks.

Application testing, tuning and delivery. Tuning and testing are performed using a combination of locally developed tools and tools provided by the speech recognizer and voice gateway vendors. Tuning is followed by piloting, beta testing and phased rollout as dictated by customer contracts.

The past two or three years have seen outstanding growth in the speech application industry, and the start of an expansion into the adjacent multimodal application industry. No single factor is more important in stimulating this growth than the creation of cross-vendor and cross-industry standards. Yet the very abundance of new and maturing standards has led to an increased incentive to hide their arcane complexity in tools to facilitate service creation without the requirement to master details of each relevant standard. Fortunately, service creation tools and deployment platforms have also matured significantly and, because of the very standards they encapsulate, inter-operate to permit cross-vendor life cycle support for speech and multimodal applications. It is the growth of standards that makes this blossoming inter-operability possible, and provides the foundation for our industry to grow to maturity.

At Unisys, Dr. Scholz managed the development of two large scale expert systems. Starting in 1991 as R&D manager, he managed business development for government service contracts. In 1994 he co-founded the NL Speech Solutions business unit and since then has been directing efforts to integrate speech recognition and natural language processing in the creation of Spoken Language Understanding systems. He is a frequent speaker at professional trade shows and was selected as one of the Top Ten Leaders in speech by Speech Technology magazine in 2001. His commitment to standards is demonstrated by his participation as the Unisys representative to the SALT Forum, the VoiceXML Forum, and the W3C Voice Browser Working Group. 

[ Return To The March 2003 Table Of Contents ]

 • TMC, Light and Electric Partner To Produce Cloud Communications Training Series
 • TMC and EMBRASE Partner to Host StartupCamp Telephony at ITEXPO East 2010 in Miami
 • Unified Communications Magazine Announces Third Annual Product of the Year Awards Call for Entries
 • TMC Accepting Applications for 25th Annual Top 50 Teleservices Agencies Rankings
 • TMC and EZGSA Announce Its First Government Contractor of the Year Award
 • 2009 INTERNET TELEPHONY TEM Excellence Awards Winners Announced
 • 2009 Unified Communications Excellence Awards Announces Winners
 • Digium CEO Danny Windham to Deliver Keynote Address at ITEXPO East 2010 in Miami
 • Polycom Co-Founder and CTO to Deliver Keynote Address at ITEXPO East 2010 in Miami
 • 4G Wireless Evolution - Verizon Wireless' Ecosystem Development Executive to Keynote ITEXPO and Collocated 4GWE Conferences in Miami
 • TMC's Smart Grid Web Site Gains More Than 500K Page Views in Its Third Month
 • 17th Annual MVP Quality Award Open for Nominations
 • INTERNET TELEPHONY Announces Winners of the BSS/OSS Excellence Awards
 • INTERNET TELEPHONY Magazine's 12th Annual Product of the Year Award
 • TMC Welcomes Matt Weiner as Vice President of Business Development
 • Announcing the 4GWE Wireless LTE Visionary Award
 • TMC's Information Technology Web Site Serves More Than 1 Million Page Views
 • Customer Interaction Solutions Announces 2009 Product of the Year Award Call for Entries
 • John Grogan Joins IT.TMCnet.com as Director of Business Development
 • 4G Wireless Evolution Announces Winners of the 2009 Wireless Backhaul Distinction Award
 • Anthony Cassio Joins 4GWE as Director of Business Development
 • TMC, Crossfire Media Launch New Web Site Focused on Smart Connected Products and Services
 • ITEXPO West 2009 Draws More Than 6,000 Enterprise, Service Provider, and Channel Decision Makers to Exhibit Hall and Conferences
 • 4G Wireless Evolution - Introducing 4GWE.TMCnet.com Product of the Year Awards
 • 2009 INTERNET TELEPHONY TEM Excellence Awards Call for Entries
 • 2009 INTERNET TELEPHONY Excellence Award Winners Announced
 • TMCnet Editorial Team Expanded
 • Introducing Cable.TMCnet.com Product of the Year Awards
 • Introducing Robotics.TMCnet.com Product of the Year Awards
 • 2009 INTERNET TELEPHONY BSS/OSS Excellence Awards Call for Entries
 • Paula Bernier Named Executive Editor of INTERNET TELEPHONY
 • Customer Interaction Solutions and TMC Labs Announce 2009 Innovation Award Winners
 • 4G Wireless Evolution - Announcing the Wireless LTE Visionary Award, New from 4GWE.TMCnet.com
 • INTERNET TELEPHONY Magazine Announces Winners for the 2009 IPTV Excellence Award
 • TMC, Intelligent Communications Partners Launch New Web Site, Conference Covering Smart Grid Technology
 • TMC Announces Promotions within Senior Executive Team
 • TMC Expands Integrated Sales Team
 • Digium to Host Asterisk Training Courses at ITEXPO in Los Angeles
 • 4G Wireless Evolution - TMC and Award Solutions Add New Wireless Broadband Training Courses to ITEXPO West '09 in Los Angeles
 • Ingate Adds New Sessions to Its Free SIP Trunking Workshop at ITEXPO, September 1-3, in Los Angeles
 • Customer Interaction Solutions Magazine Announces 2009 Speech Technology Excellence Award Winners
 • TMC and WiNOG Announce Conference Agenda for Fixed Broadband Track at ITEXPO West 2009 in Los Angeles
 • Digium to Host Asterisk Training Courses at ITEXPO in Los Angeles
 • Erin E. Harrison Named Senior Editor for TMC and TMCnet
 • 2009 INTERNET TELEPHONY Excellence Awards Call for Entries
 • TMC Announces 2009 IP Contact Center Technology Pioneer Award Winners
 • Call for Early Bird Entries for the 2009 TMC Labs Innovation Awards
 • INTERNET TELEPHONY's 2009 TMC Labs Innovation Award Winners Announced INTERNET TELEPHONY's 2009 TMC Labs Innovation Award Winners Announced
 • Erik Linask and Michael Dinan Promoted within the TMCnet Editorial Team
 • 2009 Unified Communications TMC Labs Innovation Award Winners Announced
 • The 2009 INTERNET TELEPHONY IPTV Excellence Award Is Seeking Nominations
 • Influential Managers at Enterprises, SMBs, Government Agencies Rely on IT.TMCnet.com
 • TMC Introduces 'Telecom Agent Day' at ITEXPO East 2009
 • Customer Interaction Solutions Magazine Releases 2009 Editorial Calendar
 • Betsy Estes Joins Leading Global Media Company as Senior Accountant
 • Ingate's Free SIP Trunking Seminar Returns to TMC's INTERNET TELEPHONY Conference & EXPO in Miami
 • Customer Interaction Solutions Announces 2008 Product of the Year Award Call For Entries
 • 2008 Speech Technology Excellence Award Winners Announced by Customer Interaction Solutions Magazine
 • 2008 INTERNET TELEPHONY Excellence Award Winners Announced

Share

3rd Annual VoIP Developer Conference
August 8-10, 2006 - Westin Santa Clara Santa, Clara, CA • http://www.voipdeveloper.com

TMC's Customized Keymail Alert and RSS Service Usage Instructions
 To receive daily e-mail alerts and RSS URLs of stories posted on TMCnet.com, please enter keyword terms to match and your e-mail address.  
Keyword 1:
Keyword 2:
Keyword 3:
 
E-mail Address:

Search terms are case-insensitive.

Enclose in double-quotes for exact phrase match.

No password necessary!

Subscribe FREE to all of TMC's monthly magazines. Click here now.












Subscribe Today!



Latest Stock
Information