Artificial intelligence is not perfect. But it’s pretty darned good. Just consider the AI assistant Duplex that Google demonstrated in May.
Google (News - Alert) Duplex aroused a lot of discussion and excitement due to its ability to allow for human to machine interactions in a way that’s so natural the human may not recognize he or she is actually talking with a virtual assistant. The company is testing this technology – which will initially allow AI assistants to make simple appointments for people – this summer.
Check out the first part of this video, showing how Google Duplex enables an AI voice assistant to make a restaurant reservation. Before the reporter revealed it is the male voice that is the robot, I thought the female taking the reservation was the AI assistant. Quite amazing.
Now watch the second part of this video. It demonstrates how Google Duplex can adjust even when met with a different result than originally intended.
In this case, the AI assistant calls a salon to make a haircut appointment. The desired time is not available, and the human receptionist offers an alternative. The human then asks to confirm what kind of service the customer is seeking.
Impressively, the Google Duplex bot adjusts to the new information. It also is able to recall the desired service was for a haircut.
In this company blog Google engineers explain Duplex is a recurrent neural network that leverages the company’s automatic speech recognition technology, along with audio features, conversation history, parameters of the conversation (like the desired time of an appointment and the service to be rendered), and more.
“We trained our understanding model separately for each task, but leveraged the shared corpus across tasks,” they wrote. “Finally, we used hyperparameter optimization from TFX to further improve the model. We use a combination of a concatenative text to speech (TTS) engine and a synthesis TTS engine (using Tacotron and WaveNet) to control intonation depending on the circumstance.”
Google also introduced speech latency to simulate the patterns of natural speech. It even went as far as employing verbal crutches – like hmm and uh – that many of us use during conversation.
Uh, pretty pretty pretty good. And, hmm, a little scary.