Accents can be difficult to understand, particularly in the fast-paced world of customer service. One company is using technology to tackle the problem in real-time during conversations.
Sayso, a company that builds speech applications powered by AI speech transformation technology, is giving developers an API that can change accented English from one accent to another in near real time. The company is using a different approach than speech-to-text technologies, which use natural language processing to attempt to decipher what an individual is saying. Sayso's solution simply takes individual sounds and changes them to make them more understandable. It factors in the way the mouth, tongue and lips shape sounds and how they are relayed through the vocal cords to tackle the obstacles posed by accented speech.
“We don’t do anything with words and sentences," said Ganna Tymco, founder and CEO of Sayso. "Instead, we do direct waveform operation — we work with disentangled speech elements. What I mean by that is things like voice, intonation, speech, content, accent, we can work with fillers, like uhms, and aahs. And we can alter one component or multiple components at a time, and we can alter it in real time if we want.”
Tymco explained that articulatory gestures are really just groups of sounds and are accent independent. Sayso's technology takes those sounds and chops them into very small chunks that are milliseconds in length, then applies real-time processing.
"We map speech that is of one accent to a different accent," said Tymco. "So we have parallel data, and we teach our system to see how the sound wave for the speaker with an accent would look like versus the speaker who is talking. And then we alter the shape of the sound wave to match it more to the desired accents. The really neat thing about it is that it is universal. So it’s independent of accent.”
Tymco said Sayso began training its systems with Hindi English and U.S. English accent pairs. She said it has since expanded to include Chinese, Spanish and Japanese accents. Tymco also added that transcription is part of the company's business strategy, as she has observed that automatic subtitles can be completely inaccurate.
"There’s a very strong correlation to how close a founder’s accent is to Standard Hollywood English and how good the transcription is," said Tymco. "For someone with a strong Dutch or Indian accent, the transcriptions are far worse — processing the audio through a Sayso-like filter before trying to run transcription on the audio file may result in far better transcriptions. Our tech is definitely applicable to transcription.”
Edited by
Luke Bellos