Sayso Develops Technology for Changing Accented English in Near Real Time

By Laura Stotler March 17, 2022

Accents can be difficult to understand, particularly in the fast-paced world of customer service. One company is using technology to tackle the problem in real-time during conversations.

Sayso, a company that builds speech applications powered by AI speech transformation technology, is giving developers an API that can change accented English from one accent to another in near real time. The company is using a different approach than speech-to-text technologies, which use natural language processing to attempt to decipher what an individual is saying. Sayso's solution simply takes individual sounds and changes them to make them more understandable. It factors in the way the mouth, tongue and lips shape sounds and how they are relayed through the vocal cords to tackle the obstacles posed by accented speech.




“We don’t do anything with words and sentences," said Ganna Tymco, founder and CEO of Sayso. "Instead, we do direct waveform operation — we work with disentangled speech elements. What I mean by that is things like voice, intonation, speech, content, accent, we can work with fillers, like uhms, and aahs. And we can alter one component or multiple components at a time, and we can alter it in real time if we want.”

Tymco explained that articulatory gestures are really just groups of sounds and are accent independent. Sayso's technology takes those sounds and chops them into very small chunks that are milliseconds in length, then applies real-time processing.

"We map speech that is of one accent to a different accent," said Tymco. "So we have parallel data, and we teach our system to see how the sound wave for the speaker with an accent would look like versus the speaker who is talking. And then we alter the shape of the sound wave to match it more to the desired accents. The really neat thing about it is that it is universal. So it’s independent of accent.”

Tymco said Sayso began training its systems with Hindi English and U.S. English accent pairs. She said it has since expanded to include Chinese, Spanish and Japanese accents. Tymco also added that transcription is part of the company's business strategy, as she has observed that automatic subtitles can be completely inaccurate.

"There’s a very strong correlation to how close a founder’s accent is to Standard Hollywood English and how good the transcription is," said Tymco. "For someone with a strong Dutch or Indian accent, the transcriptions are far worse — processing the audio through a Sayso-like filter before trying to run transcription on the audio file may result in far better transcriptions. Our tech is definitely applicable to transcription.”




Edited by Luke Bellos
Get stories like this delivered straight to your inbox. [Free eNews Subscription]

TMCnet Contributing Editor

SHARE THIS ARTICLE

Thryv Command Center, Now Available, Propels Growth for Small Businesses

Thryv Command Center serves as a centralized communication hub that empowers SMBs to oversee all their interactions with customers and team members from a unified platform.

Read More

Report Finds Explosive Growth in Unified Communications as a Service Market

According to the latest findings by Allied Market Research, the UCaaS market was valued at $27.04 billion in 2021, with an estimated projection to reach $118.8 billion by 2031. This growth is marked by a compound annual growth rate (CAGR) of 16.3% from 2022 to 2031.

Read More

Business Leaders Increase IT Budget in 2023: Engage with Vendors Delivering Technology Solutions at ITEXPO 2024

IT has become one of the main drivers of value to an enterprise. The result has been an increase in overall budget allocated to enterprise IT in 2023.

Read More

Lumen Technologies Shields Texas Rangers During World Series Triumph

Lumen Technologies offered the Texas Rangers on-demand DDoS to help ensure their critical infrastructure, stadium Wi-Fi and fan experiences remained uninterrupted from cyberattacks.

Read More

Boom Releases MAGNA Pro Conferencing Camera, Backed by Intelligent AI Features

The Boom MAGNA Pro offers Ultra HD 4K resolution at a smooth 60 frames per second and a 27x zoom.

Read More