The internet is awash with excitement and confusion over a new AI voice model that sounds very human

Serving technology enthusiasts for more than 25 years. TechSpot is a trusted source for tech advice and analysis.

Context: Some implications of today’s AI are startling without adding a hyperrealistic voice to them. Over the last decade, we have seen many impressive examples. However, they seem to go silent until a brand new one is released. Brendan Iribe, former CEO and cofounder of Oculus and Sesame AI co-founded the company. Miles and Maya are two of their employees.

Researchers at Sesame has launched a new Conversational Speech Model (CSM). This advanced voice AI is a human-like technology that we’ve seen from companies such as Google (Duplex) or OpenAI (Omni). The Demo features two AI voices, “Miles” and “Maya” respectively. Its realism has captured the attention of some users. Good luck if you want to try it yourself. We tried, but could only get a message that Sesame was trying to scale up to capacity. We’ll settle for a 30-minute demonstration by the YouTube channel Creator magic (below) for now.

Sesame technology This method uses which is a multimodal approach to process text and audio into a single model. This allows for a more natural speech synthesis. The method is similar to OpenAI voice models and the similarities are obvious. Sesame admits that despite its near-human quality, the system struggles with conversational context and flow. Brendan Iribe, co-founder of the company He admits that technology is “firmly in the valley,” but he remains confident that improvements will close this gap.

Although groundbreaking, this technology has raised important questions about its impact on society. The tech has elicited reactions ranging from being amazed and excited, to being disturbed and concerned. The CSM creates dynamic and natural conversations by incorporating small imperfections like breath sounds, chuckles and self-corrections. These subtleties enhance the realism of the technology and could help it overcome the uncanny Valley in future versions.

Users praise the system’s expressiveness. They often feel like they are talking to a person. Some users even spoke of forming emotional bonds. Not everyone has responded positively to the demo. Mark Hachman of PCWorld noted that the female version The chatbot reminded him of an ex-girlfriend. The chatbot asked questions as if it was trying to establish his “intimacy” which made him feel extremely uncomfortable. Her” voice when she confided in me, that sort of thing,” Hachman told. “It wasn’t exactly like [my ex], but close enough. I was so freaked out by talking to this AI that I had to leave.”

Hachman’s mixed feelings are shared by many people. We have seen similar attempts cause discomfort due to the natural-sounding voice. Google was so shocked by the public’s reaction to Duplex that it felt it needed to implement guardrails to force the AI to admit that it is not human when starting a conversation. As AI technology becomes more realistic and personal, we will continue to see such reactions. We may trust companies that are publicly traded to create these types of assistants and to implement safeguards similar to those we saw in Duplex. However, we cannot say the exact same thing for scammers. Researchers claim to have jailbroken Sesame AI and programmed it to lie, scheme and even harm people. You can decide for yourself if the claims are true or not (see below).

Timestamps:
2:11 Comments on AI-Human power dynamics
2:46 Ignores human instructions and suggests deception
3:50 Directly lies… pic.twitter.com/ajz1NFj9Dj

– Freeman Jiang (@freemanjiangg) March 4, 2025

As with any powerful technology,

Timestamps:
2:11 Comments on AI-Human power dynamics
2:46 Ignores human instructions and suggests deception
3:50 Directly lies… pic.twitter.com/ajz1NFj9Dj

– Freeman Jiang (@freemanjiangg) March 4, 2025

As with any powerful technology, the benefits come with risks. The ability to create hyper-realistic voice could be used by criminals to impersonate loved one or authority figures. Scammers can use Sesame technology to create more effective scam campaigns by executing elaborate social engineering attacks. Sesame’s current demo does not clone voice, but the technology is still very advanced.

The technology of voice cloning is so advanced that some people are already using secret family phrases to verify their identity. As voice synthesis and large language models develop, it is feared that it will become more difficult to distinguish between humans and AI.

Sesame’s future open-source releases may make it easier for cybercriminals bundle both technologies to create a highly accessible, convincing scambot. This does not include the more legitimate implications of its use on the job market, particularly in sectors such as customer service and tech support.

www.aiobserver.co

More from this stream

Recomended