×

SUBSCRIBE TO TMCnet
TMCnet - World's Largest Communications and Technology Community

CHANNEL BY TOPICS


QUICK LINKS




 

September 1998


Component Standards Speed Development Of Speech-Enabled Applications

BY STEVE EHRLICH

Call center application developers are looking for something better than automated touch-tone systems to handle the thousands of incoming calls they get every day, but don’t want to spend significant time learning new technologies and designing applications. Corporations are accelerating the deployment of speech recognition systems to eliminate the interminable prerecorded menus of information that often frustrate callers, thereby improving customer service, lowering the number of callers that "zero out" to a live operator, and offering significant cost savings.

The standard that will promote the development of speech applications is speech components. By using components, developers don’t have to rewrite code every time they add to or build new applications. The components will remove much of the complex linguistic guesswork from building a speech recognition system.

SIMPLIFYING SPEECH APPS
The hardware and software technology is now mature enough that corporations can confidently implement these speech recognition applications. With a conversational speech interface, the caller has more control over their transaction. They can barge into the automated system, state what they need, and say it as a natural language query.

Speech recognition applications can also be accessed from any telephone, which is a huge consideration for companies like UPS that are looking to expand into Europe, where rotary phones are still common. Speech rec is also valuable for companies like Charles Schwab, whose clients often call its automated stock quote system from cellular telephones, and want hands-free access while on the road.

Until now, companies have been hampered by the difficulty of developing and deploying the applications that recognize spoken words. Anticipating every possible user response and accounting for the vagaries of human expression has been a larger and more complex task than most companies are equipped to handle. But the industry is rallying around a set of standards that will allow even lay programmers to develop customized speech recognition applications from preset modules. New programming architectures leverage the same concepts that object-oriented programming brings to Internet applications: componentization, reusability, and customization.

SPEECH COMPONENT STRUCTURE
Speech components should contain three elements that help developers build applications more quickly and efficiently: We call them grammars, dialogs, and prompts. Dialogs describe the entire flow of interaction between a user and an application; grammars are lists of valid comments a user might say in a particular portion of a dialog; and prompts are messages played to a user, usually to elicit a response.

Experienced programmers are excited by the possibility of using a building-block approach to create software and streamline a sometimes tedious and repetitive development process. Bob Wohlsen, the technical director of voice solutions at Charles Schwab, agrees that "Anything that helps combine the grammar for the recognizer and the dictionary for the recognizer, such as lists of generic information for stock quotes or city or airport names, would helpful. In my IVR systems, having components would certainly simplify my development process."

For components to truly catch on as a standard for speech recognition development, they need to be platform-independent. They are defined once and then implemented in a number of different environments from that single definition. Portability of the components is made possible through a set of APIs for different IVR platforms and for ActiveX and JavaBeans. These components can then be integrated into applications built using an IVR platform’s application generator or any popular development tool, such as Visual Basic or JBuilder. Support for component standards such as ActiveX and JavaBeans increases the appeal and shortens the learning curve of speech recognition programming for developers at large.

REAL-WORLD APPLICATIONS
Early adopters of speech recognition systems report stunning returns on their investments. Systems often pay for themselves in weeks or months instead of years depending on the volume of calls companies are dealing with. Just the conservation of human capital alone is staggering.

Sears, Roebuck and Co.
Since deploying a speech recognition system to route calls in its stores nationwide last spring, Sears, Roebuck and Co. has shifted 3,000 switchboard operators to other positions, gaining millions of dollars in human resources in the process. Plus, Sears’ automated call center can handle four calls at once, whereas human operators could only answer one call at a time (also see "Speech Rec Slashes Sears’ Call Handling Costs").

The company is considering growing the system to handle employee inquiries into payroll, benefits, and company procedures, and is committed to staying on the cutting edge of speech recognition technology.

United Parcel Service
United Parcel Service reported similar savings in human costs. After deploying a package tracking system last November, the company saved so many man-hours that it didn’t have to hire its usual supply of temporary workers for the Christmas holiday. On December 23, 1997 alone, UPS logged 193,000 calls on the system.

Increasingly, we will see corporations deploying speech recognition systems, as more and more call centers in the financial and travel industries seek to cash in on the benefits of this automation. Getting these speech recognition applications to market is becoming easier as the application development tools become more sophisticated, as standards are set for APIs, and as programming tasks are streamlined by using components.

Steve Ehrlich is vice president of marketing for Nuance Communications, one of the largest U.S. suppliers of conversational speech recognition systems. Prior to joining Nuance, Steve spent twelve years at Oracle Corporation. Nuance’s SpeechObjects product is a set of reusable components incorporating design and development standards to significantly speed up development of speech-enabled applications. The company plans to offer training and certification courses for SpeechObjects developers. Sears, UPS, and Charles Schwab are all Nuance customers. For more information, contact the company at 650-847-0000, or visit their Web site at www.nuance.com.


Securing Your Speech-Enabled Apps

As a developer, you spend a lot of time building speech-enabled applications. As an end user, you put a lot of thought into your purchasing decision. But does your consideration extend to the security of your speech recognition system?

Any industry where telephone transactions occur — phone shopping, calling card transactions, home banking, even cellular telephone use — are potential targets of fraud, because it is fairly simple for experienced hackers to obtain personal identification information. When you dial any touch-tone number on the keypad, the DTMF tones send off a frequency. If someone is listening in on your line, they can pick up the frequency and crack the code. It takes just seconds for your PIN or credit card number to be stolen. More secure methods — beyond PIN codes and passwords — are needed to handle telephone transactions.

BEYOND DTMF
Biometric verification significantly reduces the incidences and consequences of fraud. Simply stated, biometric verification confirms the identity of a person by digitally measuring selected features of some physical characteristic, and comparing those measurements with those filed for the person in a reference database. This database can be stored on a server, a laptop, or even on a smart card carried by the person. Biometric verification can work off of voice prints, finger prints, retinal scans, and more.

The application best suited to telephony, clearly, is voice verification. Voice verification provides both the end user and the business with a level of security unattainable through other methods — because each voice has a distinct pattern, there is no way to steal the code.

The use of voice verification also translates into shorter call time. It takes less time for a caller to state his passphrase than it does to punch in a lengthy account number. Accuracy is also significantly enhanced: While a caller may punch in an incorrect account number or PIN code, the chances of stating a passphrase incorrectly are practically nil.

To use voice verification, the users’ voices must be entered into the system. This simple process involves repeating your spoken password several times into a phone. Once your voice has been captured, the voice print — or "bioprint" — is permanently stored. When you place subsequent calls to the system, it initiates the process of matching your live bioprint with the stored bioprint.

VOICE AND SPEECH REC
Voice verification is used easily in combination with speaker recognition, and the system is transparent to the end user. Speech recognition recognizes what an end user is saying, while voice verification authenticates that the person speaking is who he claims to be.

A typical transaction using voice verification in combination with speech recognition is home banking. Say you call your bank to check your account balance. The automated voice attendant would ask you to speak your unique passphrase into the phone. Speaking your passphrase allows your voice to be verified, and only then are you granted access to your accounts. Voice verification provides increased security by verifying that you are who you claim to be and not merely the holder of a four-digit PIN code. Upon verification, the automated attendant would then ask which account you would like to access — checking, savings etc. You make your selection by speaking "checking" or "savings" into the phone.

As an added security measure, a system of "challenge/response" can be initiated. This type of system generates a passphrase each time the system is accessed. This way, a caller has no prior knowledge about which passphrase must be repeated. The challenge/response system eliminates the possibility of having your voice, and passphrase, taped. A system of challenge/response allows the bank to ask you to repeat a different passphrase each time you call to gain account access.

GETTING STARTED
Surprisingly, voice verification technology does not require a major capital investment. Plus, there is no specific operator training involved. Once the biometric verification system is operational, it is extremely easy to use, and no additional maintenance is required beyond what would be necessary with any telephone system.

From a hardware and software perspective, voice verification technology can be integrated into virtually any existing platform. The better voice verification systems support a full range of development environments, including C/C++ and Visual Basic.

It is important to consider a biometric authentication vendor whose product offers the flexibility to incorporate additional verification methods down the road. While telephony may currently be your company’s only biometric application, you may want similar technology for physical access control, as well as network and data security (such as remote access, desktop access, and Internet and intranet applications) in the future. Planning now and choosing a vendor that can go beyond just your current telephony needs will put you on the right track to securing your technology investment.

Francis Declercq is president and CEO of Keyware Technologies. Keyware offers enhanced security solutions with the Layered Biometric Verification (LBVTM) Security Server. The LBV integrates several biometric technologies (voice, face, fingerprint, etc.) into one security solution. Additionally, the LBV Server has the ability to make intelligent, rule-based decisions that combine the different biometric results and enhance the performance of the biometric authentication. For more information, contact the company at 781-933-1311, or e-mail them at [email protected].

 







Technology Marketing Corporation

2 Trap Falls Road Suite 106, Shelton, CT 06484 USA
Ph: +1-203-852-6800, 800-243-6002

General comments: [email protected].
Comments about this site: [email protected].

STAY CURRENT YOUR WAY

© 2024 Technology Marketing Corporation. All rights reserved | Privacy Policy