DeepL Voice is the ‘next-generation’ of AI translation.
German technology darling DeepL (finally) launches a voice to text service. DeepL Voice converts audio from video or live conversations into text.
DeepL allows users to listen to someone speak a language that they don’t know and have it automatically translated into a language that they do understand — in real time.Currently, the new feature works with English, German, Japanese and Korean, as well as Swedish, Dutch, French Turkish, Polish, Portuguese Russian, Spanish and Italian.
The launch of DeepL Voice is exciting because it uses the same neural networks that the company’s text to text offering, whichclaims is the ‘world’s best’ AI translator.
I’m eager to try a voice to text translator that might actually work as someone who has just moved to a different country. All the ones I have tried so far don’t work in real-time – there is a lag which makes them useless – and the translation quality of the software is poor.
To have face-to-face conversation, you can launch DeepL Voice and place it in front of the other speaker. It displays your conversation on each device so that everyone can easily follow the translations.
DeepL Voice can be integrated into Microsoft Teams to allow video-conferencing across language barriers. The captions for the translated text appear in a sidebar. It is yet to be seen if DeepL Voice will soon be available on platforms such as Zoom or Google Meet.
This is DeepL’s first offering of this kind, but it won’t be its last. Jarek Kutylowski, DeepL’s CEO and founder referred to Real-time Voice Translationas the “Next Frontier” for the company. Jarek Kutylowski, DeepL’s CEO and founder, said that real-time voice translation is a completely different story.
“When you translate speech as it occurs, you have to deal with incomplete input, latency, pronunciation issues and more. All of these can lead inaccurate translations and a poor user experience.
Kutylowski said, “We built a solution that would take all of these factors into account and allow businesses to break down the language barrier by enabling them communicate in multiple languages when required.”The quality of DeepL Voice will likely set it apart from other voice-to text translation providers.
From an engineering perspective, DeepL’s technology success is based on the architecture of its neural network, the input provided by human editors, as well as the training data. Kutylowski believes that DeepL has an advantage over its competitors, and this is focus.
Kutylowski told TNW that “focus is always important.” “Translate isn’t the core business at Google — it’s one of 100 side gigs. If you look at LLMs and OpenAIs as our competitors, translation is just one part of what they do and their GPU does a lot of other things. We’re focusing on one area.”
The DeepL reached a valuation of $2bn in May after securing new investment of $300mn. It supports 32 languages and has over 100,000 users.