SpeechWorks 6.0 is a tool for enabling speech
recognition (SR) in communications software. Developers create speech
recognition functions using �DialogModules� which they port to their
main C, C++, ActiveX, or Java application. For developers who prefer a
graphical environment, SpeechWorks partners with app-gen makers like
InterVoice-Brite, Aspect, and Artisoft. New features for version 6 include
more modules, better tuning tools, foreign language support, caller
personalization, and scalability.
INSTALLATION
We tested a small-scale version with a two-port Dialogic board. The
computer we used was a relatively new (and recently FDISKed) machine
running Windows NT Server 4.0. Installing the SpeechWorks software was a
no-brainer; everything worked the first time. (Generally, typical systems
using Dialogic Host D/x series or Antares series boards have off-the-shelf
hardware requirements. However, for real installations where the
application must run non-stop, you�ll want to be particular about using
an industrial PC with plenty of extra RAM and hard disk space, multiple
boards and redundant components, and perhaps even running on UNIX instead
of Windows NT.) Beyond installing the board(s), the actual SpeechWorks 6
software configuration will vary depending on the application you are
building and your stage of the development process. Those issues are
addressed in the Features and Operational Testing sections.
DOCUMENTATION
The good news is that SpeechWorks 6 includes several books that, as far as
CTI manuals go, are fine works of literature. The bad news is that they�re
only supplied in electronic formats. The specifics: there are nine .PDF
files, two Word documents, and a .TXT file. The .PDF topics include
command-line tuning tools, tuning tools/event logs, custom contexts,
installation, platform integration, an introduction to speech
applications, a service creation guide, reference notes, and more release
notes. The two Word documents are redundant copies of the installation
guide and release notes, and the text file is the release notes section
for the tuning tools.
Additionally, there is online help for the tuning tools feature, and
there are several links at the SpeechWorks Web site to white papers and
other resources. However, of all these documents, our favorites are the
introduction to SR guide and the service creation guide � for unlike
most vendors� manuals that simply explain how to use the product, these
guides are excellent academic discussions of the concepts of speech
recognition. Some parts of these manuals are obviously biased toward
SpeechWorks� own products, but even if you�re developing with products
from Nuance, L&H, etc., we would still consider the SpeechWorks guides
to be required reading. A note to SpeechWorks CEO Stuart Patterson: give
your technical writers a raise!
FEATURES
Before programmers start working on their SR applications, they can
configure the development parameters using SpeechWorks� configuration
tool. The tool has options for the integration type (Dialogic host,
Dialogic Antares, or custom), DLL locations (for prompts, recording, and
call control), data capture switches (tuning tools logging and
SMARTRecognizer logging), and data storage (location and megabytes of
space allocated). The highlight of this menu is the data capture switches.
�Capture Tuning Tools Data� logs the success or failure for
recognition information of every step of a call; the data is viewed and
analyzed with the actual tuning tools program. �Capture SMARTRecognizer
Data� is a tool that improves recognition accuracy.
The tuning tools feature is, in a word, awesome. Entire calls can be
logged, from system answer to hang-up, and the caller and system actions
are represented visually in a separate GUI from the other SpeechWorks
features. There are seven items to examine for each call, including the
DialogModules, the call summary, the call flow, the entire call actions
transcript, the start times, the call durations, and the recognition
results. A great feature in the show calls option is that you can replay a
recording of the caller�s input, in the context of where it happened
during a call (Figure 2). This way, you can compare the input to the
replies that your application expected to hear, and you can better analyze
what went wrong, why it went wrong, and how to make the application
better.
The other six options are just as useful as show calls. The main screen
for the DialogModules summary shows a bar graph of the success or failure
of each module used in your application. Also shown is the number of times
the module is called up, and a color-coded means of analyzing the results
of each occurrence. Selecting any of the modules individually shows even
more detail and color-coded bar charts, including specific information on
the confidence SpeechWorks� recognition engine had in each heard reply.
The call summary screen shows a synopsis of every module and caller reply,
including the call duration, which system port was used, etc.
Call flow shows a flow chart of a call�s possible routes, which can
be viewed with any chart cell as the starting point. Call start times
shows another bar graph indicating what times calls arrived in hour-long
blocks of a 24-hour period. The call durations option shows a similar
chart indicating the average length of time calls lasted for. �Recognition
results� provides more information on the success of your DialogModules.
Back to the SMARTRecognizer feature: �SMART� is an acronym for
Self-Modifying Automatic Recognition Tuning Engine. This engine uses
another program called LEARN. Basically, it gambles its accuracy on what
you�ll say; for example, imagine there is an application for using the
telephone to listen to the statistics of and ordering tickets for local
sports teams.
If you call into such a system in Los Angeles, the system would know
the players� statistics and ticket ordering information for the Angels
and Dodgers (baseball), Clippers and Lakers (basketball), Kings (hockey),
and Galaxy (soccer). But, if you asked for the free-throw percentage of
Kobe Bryant and then you asked for ticket information, the SMARTRecognizer
would assume that you�re interested in basketball tickets, so of the six
teams� names, it would particularly listen for the Clippers or Lakers.
Conversely, if you called and inquired about baseball tickets and then
asked for player statistics, it would listen more carefully (weight the
responses in favor of) the 50 players on the Angels and Dodgers.
SpeechWorks is the kind of product for which we could not possibly list
every feature. However, some of the remaining highlights of this version
include barge-in capabilities, multilingual support, improved vocabulary
and grammar errors (based on phoneme models), �Hot Insert�
(hot-swappable, in a sense) application tuning, alphanumeric character
pronunciations, and improved echo cancellation, recognition, and rejection
algorithms.
OPERATIONAL TESTING
Two weeks of testing is hardly enough time to make a full-fledged sample
application, but in reality, a few months would be sufficient, especially
when SpeechWorks is used in conjunction with its application generator
partners. The most important factors in developing a good SR-enabled
application is the foresight to know what responses your system will get
and when it will get them. Every time a caller gives input that your
application wasn�t expecting to hear, the recognition accuracy lessens.
To accomplish the best application, we found that SpeechWorks� tuning
tools are the best of their class, largely because they are provided in a
more easily analyzed fashion than just black-and-white numbers. Of course,
for the ultimate testing of your SR application, you would take the
additional step of stress-testing the program while receiving many calls
during high levels of network congestion.
We also found that even for a novice SR programmer, it is easy to
develop for the DialogModules. Items like the vocabulary editor are very
straightforward, and the documentation of each module is incredibly
detailed. SpeechWorks also provides sample uses of each module, which can
be duplicated or used as templates for your customized solution.
Although it might seem basic, anyone who�s been around speech
recognition knows that some of the simplest recognition tasks are actually
the most difficult. Among these tasks are dates, currency, human names,
place names, times, and common words and phrases like �today,� �tomorrow,�
�next week,� �I don�t understand,� �yeah,� and so on. A
complete list of such utterances cannot exist, but throughout our testing,
we were very impressed with how well the sample SpeechWorks applications
handled these things.
Using Speechworks� presidential election demonstration, we made a
simple application of our own, and we tried to anticipate the responses.
Our candidates were Al Gore, Bill Bradley, John McCain, George �Dub-ya�,
and, just to keep it interesting, Daffy Duck. We allowed for margins of
error; for example, people could say �Bush� for George Dub-ya, or they
could accidentally say Donald (instead of Daffy) Duck.
More complicated applications better illustrate SpeechWorks� power.
So, while we do not normally mention a vendor�s customers in a product
review, in this case we feel that SpeechWorks� own demonstrations (and
live applications) are items that potential customers should try out. (For
more information on these demonstrations, see www.speechworks.com/demos/index.cfm.)
ROOM FOR IMPROVEMENT
Most of the things we can think of that would improve SpeechWorks 6.0 are
already being planned for versions 6.5 and 7.0. Some of these features are
voice authentication and security, �plug and play� DialogModules, more
and better DialogModule customization, improved natural language abilities
for the text-to-speech engine, and improved options for caller
personalization (which the current version only begins to address).
CONCLUSION
As implied by the very short improvements section above, we feel that
based on the currently available technology, SpeechWorks 6.0 is nearly
perfect. There can always be more scalability, better recognition, and
more competitive pricing, and we anticipate that speech recognition as an
industry (not just this vendor) will face new challenges as
packet-switched voice becomes a more mainstream way of communicating �
what happens if the software can�t recognize what you said because half
of your packets arrived late, damaged, or did not arrive at all?
Overall, we are very impressed with this product, based on comparing it
to its competitors and to the current market needs. We consider
touchtone-based systems to be nearly obsolete � we say �nearly�
because there will always be situations that warrant touchtone on the
basis of security, background noise, or failed recognitions. The limit of
ways that you can SR-enable a communications product is endless, and we
feel that anyone who invests in SpeechWorks 6.0 will be quite pleased. |