November/December 2001
Optimized Telephony
Platforms In The Voice Portal Market
BY DOUG PETTY
Finding and maintaining a competitive advantage in todays service
provider markets presents a host of daunting challenges. Price wars have
proven a flop. Customers are more fickle and demanding. And the latest
economic slowdown has made the task of finding new sources of revenue even
more difficult than just one year ago.
Is there help on the way? Pose that question to industry insiders these
days and chances are that many will point to the burgeoning voice portal
market as a good way to reduce customer churn and increase cash flow. For
those who might be skeptical, consider this: Market consultancy. The Kelsey
Group forecasts that the U.S. market alone for voice portal services
and technologies will exceed $12 billion by 2005. Ovum,
a consultancy based in the UK, predicts a world market of $26 billion,
while Allied Business
Intelligence estimates 56 million mobile voice portal users by
year-end 2005, 250,000 voice sites, and a $50 billion v-commerce market.
Voice portals connect users to applications through a telephone. They
combine best-of-breed applications with the worlds most user-friendly
interface speech recognition as well as text-to-speech technology
and innovative hardware components. Most currently offer customers access
to Internet content, stockbroker information, e-mails, and airline
reservation systems. Many predict these services will soon be followed by
v-commerce, ticketing, and enterprise applications like voice dialing and
extranet access. The great hope for voice portals over the short term for
service providers is as churn preventers. They may also provide a way to
increase subscription and per-call fees, improve service usage, and help
add subscribers through new customized services.
VOICE PORTAL PLANNING
Getting a share of this emerging market, however, requires careful
planning and wise technology choices. Advancements in computer telephony
technology will be critical to success within the rapidly evolving voice
portal arena. Service providers as well as application developers will
need to find optimized platforms that maximize voice portal performance by
facilitating a balance among speech recognition, telephony, and computing
resources. This will ensure each resource is used to its full capacity,
while helping decrease speech recognition costs by multiplexing
incoming audio channels across the back-end software modules.
Next-generation architectures will also implement critical features such
as echo cancellation on digital signal processors (DSPs), thereby
enhancing performance by offloading all but automatic speech recognition
(ASR) and text-to-speech (TTS) functionality from the host processor.
The need for this kind of balancing act can be partly explained by the
limitations of todays host CPU architectures. Pentium and RISC
processors have finally gotten fast enough to do the intensive
digitization, probability calculations, and matching once done exclusively
by more expensive DSP resource cards. But even dual Pentium 800 systems
can run only a limited number of active simultaneous ASR or TTS channels.
These systems generally can handle a maximum of only 100 incoming ASR or
TTS channels per chassis.
The most appropriate way for service providers to improve per chassis
scalability and cost effectiveness for current voice portal architectures
is therefore by increasing DSP horsepower rather than line densities. Each
telephony network card should be equipped with enough processing muscle
and flexibility to perform both incoming and outgoing audio streaming on
all ports simultaneously. The ability to dynamically configure processing
capabilities on each resource card helps reduce hardware costs and overall
system complexity.
Flexible, clean, and low-latency audio streaming capabilities lie at
the core of any optimized voice portal platform. Service providers should
look for systems providing small buffers, G.168 echo cancellation, and
voice activity detection the ability to accurately and efficiently
stream only relevant voice signals to speech recognition engines.
Buffer sizes play an important role in overall system performance.
Large buffers will result in high latency and consequent noticeable delays
in ASR or TTS response times. Small buffers can help do away with latency
problems. But they also place an added burden on precious host processing
cycles. Balance is therefore needed. Buffers in the six-millisecond (ms)
range required for voice-over-IP (VoIP) applications should be considered
excessively demanding for voice portals. Those over the 100-ms range,
however, will not be able to stream audio quickly enough to speech
recognizers. Optimized platforms at the present time will come equipped
with buffers in the 100-ms range to ensure adequate performance without
placing an undue strain on processor resources.
An ability to detect and forward only relevant voice signals to
host-based speech recognition engines also improves voice portal
performance while reducing the load placed on CPUs. Voice detection
technologies are currently under development by several telephony and
portal vendors. The most cost effective may turn out to be an energy-based
detector embedded within play/record DSP applications that can be
dynamically configured on a per channel basis according to the status of
the recording application. Pre-speech buffers will be critical to the
effectiveness of this technology. Thats because words at the beginning
of a speech signal can be omitted if there is a split second delay in
triggering the speech recognizer engine. Whats therefore needed is a
mechanism to continuously buffer voice signals and forward this small
segment at the front of incoming voice signals when triggered. At the
start of a speech burst, at least 250 ms of speech preceding its detection
should therefore be passed from a buffer to the host.
Optimized voice portal platforms will also come equipped with market
leading G.168 echo cancellation. This is critical for cut-through and
barge-in functionality. Echo in a telephony communications circuit occurs
when a portion of the transmitted signal is reflected back through the
circuit to its point of origin. This can interfere with signal detection.
When a voice prompt is being played, for example, a portion of the prompt
can be reflected back. This reflected signal might interfere with the
signal the speech recognizer is attempting to identify, reducing system
performance. Echo cancellers with a minimum tail length of 12 ms can help
alleviate this problem through a series of predictive algorithms that
subtract echo from the merged source signal.
Over the next few years, the voice technology behind todays leading
portals could change both Web-based services and services that have little
or nothing to do with the Internet. It could streamline almost every
informational interaction conducted over the phone, from receiving
directory assistance and making reservations to retrieving bank balances
and wrestling with corporate help desks. Such innovative technologies will
ultimately enable service providers to evolve voice portals into a form of
"intelligent dialtone" offering ubiquitous access to any product
or service to any location simply by lifting a telephone handset or
turning on a mobile phone. By ensuring the deployment of optimized
platforms today, service providers will ensure they are well positioned to
assume a leading position in this new paradigm.
Doug Petty is vice president of technology at Pika Technologies,
Inc., a leading developer of robust hardware and software platforms and
components that allow developers to build scalable, next-generation voice
applications. Pika is committed to delivering high levels of customer care
and satisfaction. The company serves customers in North America, Europe,
and Asia. For more information, please visit their Web site at www.pikatech.com.
[ Return
To The November/December 2001 Table Of Contents ]
|
|
Hosted Speech Recognition
For Customer Service
BY BRUCE POLLOCK AND JONATHAN MCINTOSH
Speech recognition technology is powerful when used in call centers
to handle calls or portions of calls normally handled by operators. The
value proposition for speech enabling a subset of calls traditionally
handled by operators is strong; companies can save money, improve customer
service, and increase revenues. Per-call savings range from 50 to 70
percent over live agents depending on a number of factors including length
and type of call being speech enabled. Speech recognition can also help
companies to increase their self-service throughput rate (e.g. in the IVR
system prior to transferring the call to an agent) by reducing the number
of zero outs that may be happening in the touch-tone system.
Speech can improve customer service too. First, it reduces hold times
by handling simple inquiries instantly (and thus removing these calls from
the queue); leaving operators to handle the more complex calls (or
portions of calls) more quickly. This helps to cut abandonment rates and
increase revenues where there is a financial transaction involved (e.g. in
a catalog call center).
Second, it allows companies who currently operate on a business day
basis (e.g. 7:30 am to 6 pm) to be accessible on a 24x7 basis for simple
customer inquiries like bill balances, account/shipment status, etc.
Third, speech makes it easier for mobile phone users and callers using
phones with keypads in the handset to communicate with the company they
are doing business with. Callers dont have to be professional
contortionists to be able to complete their self-service transactions!
THE CASE FOR SPEECH ASPs
Designing, building, tuning, and managing an enterprise call center speech
recognition application that meaningfully achieves cost reduction,
revenue, and service enhancement objectives is a difficult task. Most
companies dont have the in-house speech recognition expertise and
financial resources to do so, which is why it makes sense to rely on a
service provider (outsourcer) with experience in this specialized field.
Experienced speech service providers can get a hosted application
scoped, designed, and operating in about a third of the time of the
average on-premises project, so call centers can enjoy cost savings,
revenue increases, and service improvements much faster than if they
attempt an in-house initiative. If youre considering an in-house
project, dont forget to include the cost of recruiting and training a
speech team and the opportunity cost of tying up your internal IT group;
chances are, youll save money by going the ASP route.
Another benefit of dealing with a speech ASP is that most of them
charge on an affordable per minute or call completed basis,
and dont charge large up-front fees that youd have to pay in the
form of capital expenditures if you want to build an on-site speech
system. When you add up hardware, software, integration, tuning, and other
expenses involved in an in-house initiative, you may find that its more
money and development time than you anticipated. Moreover, the number of
speech ports youll need to handle your peak call loads is larger for an
in-house speech system than it is if you deal with an ASP, because ASPs
spread call loads from various customers, giving them higher port
utilization.
The leading speech ASPs in the market today also have dialog design and
human factors expertise, which are integral prerequisite skills required
to build a speech recognition system that callers will love and use.
Dialog design is the interaction between the caller and the speech
recognition system. Its both an art and science and if you get this
part of the application right, youre on your way to having a speech
system that has meaningful cost reduction and service improvement impact.
When choosing a speech ASP, prospective customers should focus on the
companys expertise in areas like dialog design (as mentioned above) and
experience in developing and deploying production speech applications.
Additionally, prospects should assess the ASPs experience in building
robust, reliable host-interface systems and deploying computer telephony
integration (CTI) for call centers because most if not all speech
recognition applications require some type of operator support. Prospects
should also ensure the speech ASP has the capacity and operational
redundancy to handle peak call loads. Ideally, the ASP should be able to
demonstrate how they manage these complexities today for their current
customer base.
Finally, here are a few tips for speech ASPs and prospective speech
clients to help their programs run smoothly and effectively:
- First, establish a speech recognition project committee (to manage
the project day to day), and a smaller call center agents advisory
group for input on the initial application;
- Second, ensure the speech ASP trains the call center agents on the
functions, features, and benefits of the speech system. Armed with
good information, agents can be allies and proponents of speech in the
call center and help to generate more self-service inquiries; and
- Third, create a feedback loop so that agents can forward caller
feedback to the speech ASP so the speech application(s) can be
continuously improved.
Bruce Pollock handles speech recognition services, and Jonathan
McIntosh is senior vice president at West Interactive. West is a leading
provider of outsourced customer contact solutions that help Fortune 1000
companies acquire, retain, and grow profitable customer relationships.
Wests customized solutions incorporate integrated speech recognition,
IVR, Internet, and live outsourced operator services. For information,
visit their Web site at www.west.com.
[ Return
To The November/December 2001 Table Of Contents ]
|
|
Eight Things To Consider For Your Voice Portal
BY DAVID FRIEND
The ability of computers to understand human speech has been a goal of computer scientists for over 40 years. In the late 60s, I worked on a project for the U.S. Postal Service that was intended to allow mail sorters to speak zip codes into a computer and have an envelope correctly routed. Unfortunately, even the best computers of that era couldnt even manage a vocabulary of 10 spoken words.
Today, inexpensive PCs armed with advanced speech recognition software can identify thousands of words in dozens of languages with sufficient accuracy for a great number of uses. The one billion phones are about to be transformed into the worlds most ubiquitous I/O device. Here are eight things youll need to consider when deploying speech applications for your company:
- Build vs. Buy
Speech apps today are still mostly custom-designed and run on proprietary telephony hardware platforms that grew out of the voice mail and interactive voice response (IVR) industry. In the near future, well see a shift to running suites of off-the-shelf speech applications on commodity servers and IP gear. Applications will no longer be tied to a specific hardware platform but will simply be delivered as a CD-ROM. Great new GUI tools are already making the job of building and modifying applications much simpler, reducing development time by 7080 percent over writing code.
- Text-to-Speech Engines
Text-to-speech (TTS) engines let computers talk. The main thing to worry about with TTS is how natural it sounds and what languages it can speak. Unless youre assembling your own speech application platform from scratch, however, your platform vendor will probably make its own decision on which TTS engine to use.
- Automated Speech Recognition
Again, the automatic speech recognition (ASR) engine is just one component of a speech application. With the exception of language support, the three leaders,
Nuance, Philips Speech Processing, and
SpeechWorks, are similar and Id recommend letting the platform vendor choose. Differences are subtle and subjective, so Im not sure you should worry about it unless youre building your own platform (not recommended!)
- Development Environment
The development environment is something you should definitely worry about. The best GUI development tools can cut 70 percent or more off the time it takes to develop or modify an application. They also insulate your developers from some of the more arcane low-level details of telephony programming. Programmers like to write code, but a mission-critical speech application written in C++ is going to be a nightmare to maintain. Pick the best tool and learn how to use it. Take a cue from the Web world where Web design tools rule.
- Run-time Platform
Run-time Platform refers to the physical equipment that the phone lines plug into. Most of the platforms on the market today are based on specialized telephony switches and DSPs, all developed for traditional TDM-style telephony. But the world is moving inexorably to IP telephony and new platforms are emerging that process speech with standard Pentium-powered computers and IP switches. Costs are much lower, scalability is much greater, architecture is more open, and development tools more robust. The new generation of voice services platforms straddles the phone network and the Internet, and hence these devices are ready to support applications that marry the phone and the Web.
- VoiceXML
VoiceXML is a programming language that many vendors of automated speech recognition (ASR) engines are promoting for the development of voice applications, particularly those applications that use speech recognition. Most voice applications, however, require features and call flows that go way beyond what VoiceXML can do today and nearly everyone in the industry uses tools or other languages in addition to VoiceXML. If you are planning to develop your own speech recognition applications, you should understand your vendors position with respect to supporting VoiceXML as it will probably be important a couple of years from now.
- Applications
Most companies need to develop their own specific applications, just as they need to develop their own Web sites. However, there are certain horizontal applications that nearly everyone seems to want, such as having e-mail read to them over the phone, access to calendars, task lists, contact lists, and so forth. There is no more point in trying to build one of these applications yourself than there would be in developing your own word processor. If you can buy and modify it, thats the way to go.
- User Interface
Designing a good user interface for a telephony application is not as simple as it would seem. Remember those horrible, cluttered, early attempts at Web screen design? Well, now we have horrible, frustrating telephony interfaces and the skills for designing good ones are still scarce. At a conference recently, I showed an example of a speech interface I stumbled across that made me answer six questions to locate a restaurant when two would have gotten me to the same place in half the time. Good development tools will encourage good design practices, and modifying a well-done commercial application will produce better results than starting from scratch.
CONCLUSION
Just as most of us have grown to prefer ATMs over human tellers, speech applications will increasingly replace humans. Organizations and end users will both benefit with increased satisfaction levels and a surge in efficiency. As a result, you wont have to endure that annoying "Your call is important to us" message that has been plaguing enterprises and consumers for years.
David Friend is chairman and CEO of Sonexis, Inc. Sonexis is a global provider of voice solutions for enterprise customers and service providers, enabling anytime, anywhere transactions and information access. Its products and services reduce the cost, time-to-market, and complexity of developing voice solutions. Sonexis voice solutions focus on consumer and enterprise voice portals, voice commerce, enhanced self-service, and voice-enabled CRM. Contact Sonexis at
www.sonexis.com.
[ Return
To The November/December 2001 Table Of Contents ]
|
|