SpeechWorks Host 1.1 is a phonetic toolkit for adding natural (continuous) speech
recognition to CTI applications that use Dialogic boards. This small-scale solution is an
entry-level version of SpeechWorks' (formerly ALTech) 4.0 platform. Dialogic licenses and
resells this host-based version of the product, which is upgradeable to the full
SpeechWorks package. By itself, the Host package requires only a 2- or 4-port board and a
Pentium-class CPU, without requiring any special DSP hardware. Its best feature is
Dialogue Modules, essentially prepackaged building blocks of the most common speech
recognition features. These modules are highly customizable. Users can assemble the
modules in three ways, including text coding in any C or C++ environment, as building
blocks in Dialogic-specific visual application generator environments, or as ActiveX
modules in more mainstream CTI application generators. Host also features a vocabulary of
up to 250 words, speaker independence, which means that no speaker training is required,
and with a price point of about $250 per port, it is extremely inexpensive. Host 1.1 is
surprisingly powerful for an entry-level product, rivaling some of the large-scale
products from some of SpeechWorks' competitors.
Installation
A Dialogic or Dialogic-compatible board needs to be installed and configured before the
Host software installation can begin. We tested the software on two computers with D21H
and D41H boards, but D41EPCI and D41ESC boards also work. Along with the board, additional
requirements include a Pentium running at 133 MHz for two ports, at 166 MHz for four ports
or at 200 MHz for eight ports; 64 MB RAM highly suggested; 40 MB of hard disk space;
Windows NT 4.0; the Windows small fonts setting; Dialogic System Software and SDK DNA 3.0
for NT; Microsoft Visual C++ 4.2 or later; and a SpeechWorks Host enablement disk.
Both the ASR Development Package and the ASR Enablement Package utilize standard
Windows install shields. Install the Development package first, reboot your system,
install the Enablement Package and reboot again, just to be safe. Without the enablement
software, the Host suite will work for only 30 days for evaluation purposes. Meanwhile,
the maximum number of boards supported is two, for a total of eight ports.
We had no problems with the installation process. The boards and software installed
smoothly on both PCs, and we see no reason why any developer, even a beginner, would have
any problems either. We wish that all CTI hardware and software installed so seamlessly.
Documentation
Like the product's installation, the documentation for this package was strictly
top-notch. A small release notes manual covered the installation and new features. Missing
from the printed documentation was detailed information on the board installation and
troubleshooting, but the Configuration Manager application, with features like auto-detect
and the ability to control service start-up modes, provided most of the knowledge we
needed for the board installation. The Configuration Manager came with its own online help
file. The DNA CD comes with its own setup of online documentation. There isn't much online
documentation for the Host software itself, but there is extensive printed documentation.
A software reference, a Creation Guide and an Integration Guide are included, all
accompanied by numerous screen shots, technical and beginner's explanations, glossaries,
sample application, etc.
Overall, we were very impressed with the quality of these manuals. Even the SpeechWorks
corporate Web site is handy, containing one of the best technology background sheets
we've seen. The various documents here can lead even the CTI neophyte to the land of
sophisticated speech-recognition development, without much pain or suffering. Most of the
documentation was written by Dialogic, not by SpeechWorks, and it's a testament toward
considering the addition of improved manuals as a significant value-add.
Features
As we explained above, SpeechWorks Host makes extensive use of prepackaged, highly
customizable building blocks called Dialogue Modules. Five essential modules are included.
These are: YesNo, ContinuousDigits, AlphaNumeric, VoiceMenu and ItemList.
YesNo does exactly what its name implies: it queries callers for a yes or no
answer to a question, determines the response, takes some other action if it understands
the answer and repeats the query in a clearer context if the answer is not understood.
After multiple and different attempts and methods of trying to understand the answer, the
module can be programmed to send callers to live operators. This module, while complicated
in its flow-chart form, is actually the simplest module and is nearly 100 percent
accurate. (Its complicated flowchart reminds us all to thank SpeechWorks for inventing the
modules, which can be vastly more complicated than YesNo fans could dream of.)
ContinuousDigits collects numbers. Minimum and maximum numbers and number of
digits can be specified, along with a list of valid digit strings.
AlphaNumeric works just like ContinuousDigits, except that it also collects
characters, symbols, etc. For example, instead of being able to just say numbers, system
users could find themselves spelling out an e-mail address like [email protected].
VoiceMenu returns a caller's selection from a list that it first reads to the
user.
ItemList is like VoiceMenu, but more complicated. It offers more choices, more
options for clarifying ambiguous statements, etc. (Users who later upgrade to the full
SpeechWorks 4.0, or who buy optional modules, get access to blocks like TelephoneNumber,
Currency, Date, Spelling, ZipCode, NaturalNumbers and CustomContext.)
After users master the Dialogue Module theory, they need to learn the vocabulary editor
and tuning tools. We caution you: it would be useful to brush up on your undergraduate
linguistics curriculum, from the phonetic alphabet to Ferdinand de Saussure. The
vocabulary editor includes a graphical display of a phonetic keyboard, which allows
developers to enter infinite pronunciations of commonly mispronounced words, like
surnames, technical terms, foreign phrases, etc. Fortunately, the user-customizable
dictionary already contains almost 200,000 words written in their phonetic form, and there
is a phoneme chart, all easily accessible. Even better, the Creation manual includes an
entire chapter and an appendix/tutorial for phonetic word and phrase development. It's
easy once you get used to it, but we're not joking about brushing up on your linguistics
education before beginning a mid- to large-scale speech recognition implementation.
Finding a software generation package so directly linked to academia is rare -- it's not a
surprise that the SpeechWorks engine was sponsored by ARPA and made by MIT students.
Beside the modules and the editor, another valuable SpeechWorks Host feature is the
tuning tool. Used for logging and monitoring call activities, SLEE files (service logic
execution environments), etc., the tuning tool is a valuable way to follow, check and
confirm or diagnose literally every step of a speech recognition software's performance.
The tool was still in beta testing when we received it, but its graphical, numerical and
contextual reporting makes the entire Host package complete. Most often, developers will
use this tool for checking recognition confidence results.
Other features of SpeechWorks Host 1.1 include:
- Barge-in ability,
- N-best output/scoring,
- Customizable APIs with sample applications,
- Visual C++ 4.2 support,
- 250-word vocabulary per application.
Operational Testing
Making speech recognition technology work involves a lot more than recognizing common user
feedback words and activating software functions based on them. Even the yes/no function
can be complicated, and the complexity increases exponentially as additional choices are
added to prompt menus. Despite that, SpeechWorks Host is easy to use, even for a novice
programmer, as NT integration experience and linguistics comprehension seem much more
important here.
A standard sample application is an election demonstration. It's an included
application that polls callers for their voting choice -- Bill Clinton, Bob Dole or Ross
Perot. The application sounds simple, but as users follow the directions for building this
test, the value of the Dialogue modules becomes clear. Adding additional choices to the
poll is one option of the test application. We added Mark McGuire to the list. In
phonetics, the spelling might be "M aa r k M ih c G w ih er." Next, we added
alternate possible answers, like Big Mac, Mac Attack and Big Red. Each name presents its
own set of challenges.
If getting McGuire's name to make sense in phonetics is frustrating, hire someone to
develop speech recognition for you because it only gets more complicated. Real-world
applications like phone books, bankers' and stockbrokers' utilities and train schedules
have a lot more than our choices, and real-world answers rarely score the high confidence
ranking that Mark McGuire's or Bill Clinton's name will. Still, as confusing and
challenging as developing item lists and their associate flow charts can be, the process
is much easier with the ready-made modules. The modules can do things like apply chances
automatically and globally, so even the small and tedious tasks of speech recognition
application development are covered with Host, not just the recognition itself. Most of
all, we like that the time spent making our small test application was used mostly on the
logic itself and the possible user responses rather than on using the software. Good
software lets you focus on the goal rather than getting there, and SpeechWorks Host does
this well.
Room For Improvement
Most of the features that we would like to see improved are addressed with the full
version, which is SpeechWorks 4.0. (The full version of SpeechWorks powers General Magic's
Portico virtual receptionist tool.) That version includes more words per application, more
scalability, more Dialogue modules, etc. We have only minor criticisms of the Host
package. For example, the vocabulary editor window is difficult to resize, and the various
GUIs are not as intuitive as they could be. We would like to see a tool that tries to
auto-phoneticize words that it doesn't recognize -- it would improve recognition within
the vocabulary editor. We would also like to see the package bundled with a visual
application generator suite, or at the least, a powerful flow-chart editor. This is the
one missing piece to the speech recognition puzzle: most app-gen software is very good for
developing IVRs and similar software, and some app-gens are adding IP telephony options,
but not many have successfully included speech recognition. Tighter integration is the
key. Finally, we'd like to see the optional inclusion of a "Flow-charting and
Phoneticizing Speech Recognition for Dummies for SpeechWorks Host 1.1" textbook.
There is evidence in the included manuals that such an attempt has been made, but
currently it's too fragmented and makes too many assumptions of the developers' knowledge
of linguistics to work well.
Conclusion
We recommend this product to anyone who is willing to learn it. Learning the software is
easy; learning its methods is another story. It's very inexpensive, it's very powerful, it
uses mainstream hardware and it's well documented for a CTI product. We're issuing the
Editor's Choice award to the SpeechWorks company for making a great software package; but
that's qualified with a challenge to SpeechWorks and to their competitors: there is still
plenty of room to get better. Options for improved integration and more efficient
development platforms do exist, and we predict that they'll be as readily exploited as
mainstream CTI app-gens before the year 2000, which SpeechWorks Host is prepared to
handle. For small- to mid-scale speech recognition needs, whether in the call center or
other environment, this is one of the better purchases a manager and developer can make
together. |