The Doubao app has been updated with Realtime voice call feature

On January 20, the Doubao updated its real-time call feature, now available to all users.

This feature is based upon the latest Doubao Realtime Voice Model. After the update, Doubao’s conversational ability has reached AI interaction effects that are “nearly indistinguishable from human and machine” in terms of voice realism, emotional expression, and “joy”, “anger”, “sorrow”and “happiness”. It can mimic different voice and has shown significant improvement in “logical thinking” as well as “emotional perceptual”.

Doubao App’s real-time voice interaction has reached a level that is “human-machine-indistinguishable”with qualitative improvements in voice performance and anthropomorphism. Doubao’s new real-time call feature can control the details of the voice, such as the rhythm, childlike tones, volume, breath sounds, and scene automatically.

Doubao is also able to express emotions like joy and sadness in a very expressive way. It is able to speak in English, play multiple roles, and sing in part. It can be used as a storyteller, impromptu singer or a teacher of English.

The traditional speech dialogue system used a cascade mode consisting of ASR+LLM+TTS. This could not meet the requirements for completeness in understanding and naturalness in generation. It also couldn’t meet the low latency interaction and other dimensions of human-level voice dialog. The new voice capabilities in Doubao are built on an end-to-end innovative framework that integrates text and speech modalities with native methods for unified modelling. It can finally achieve direct multimodal input effects to multimodal output, giving AI voice dialogs a “soul”.

According to the person in charge of Doubao, the voice dialogue ensures the model has strong understanding and logical abilities to respond to real-time questions, while also having ultra low latency and smooth interrupt capabilities.

Doubao’s real-time voice calling function is a step ahead of similar products, with a clear advantage in Chinese conversation quality and high emotional intelligence as well as IQ online. According to feedback from external sources, users rate Doubao’s new voice call feature at 4.36/5. This compares to GPT-4o which has a voice dialogue satisfaction rating that is 3.18/5. Doubao is clearly superior in terms of emotional richness and naturalness of tone.

Sign up for 5 free articles per month !

www.aiobserver.co

More from this stream

Recomended