Claude (finally) gets a voice

| |

Good morning, {{ first_name | AI enthusiasts }}. The last major AI holdout just officially joined the voice movement, with Anthropic finally giving its assistant the ability to speak.

As usual with Anthropic, it’s better late than never — and with the rollout of shiny new models and now brand new voice, the AI giant is shipping once again.


In today’s AI rundown:

  • Anthropic’s new Voice Mode for Claude

  • Synthesia co-founder’s 3D world AI startup

  • Automate project meeting documentation

  • Study: AI learns reasoning through self-confidence

  • 4 new AI tools & 4 job opportunities

LATEST DEVELOPMENTS

ANTHROPIC

🗣️

Image source: Anthropic

The Rundown: Anthropic just the launch of its new Voice mode for its Claude mobile apps, becoming one of the last major AI labs to enable users to have natural spoken conversations with its AI assistant.

The details:

  • The beta feature is set to arrive for English-speaking users in the coming weeks and will run on Claude’s latest Sonnet 4 model.

  • Users can flow naturally between speaking and typing, with five voice personalities available and real-time transcription displayed during chats.

  • Voice mode also integrates with Google Workspace for paid subscribers, allowing Claude to access calendars, docs, and Gmail with voice commands.

  • Free users receive 20-30 voice messages a month, with paid tiers getting “significantly higher” usage limits.

Why it matters: With all the major labs now offering voice modes, the competition shifts to execution — with aspects like latency, integrations, and the underlying model quality all playing a role in the user experience. The capabilities also are a jarring difference from the old-gen voices like Siri, showing how behind it truly is.

TOGETHER WITH POSTMAN

🚀 

The Rundown Postman’s Agent Generator delivers complete turnkey infrastructure with zero server setup, enabling developers to build and deploy AI agents instantly without friction.

With Agent Generator, you can:

  • Instantly spin up agent workflows

  • Works with OpenAI, LangChain & more

  • Test, debug, and deploy—all in Postman

.

SPAITIAL

🌐 

Image source: SpAItial

The Rundown: Synthesia co-founder Matthias Niessner just SpAItial, a new startup aimed at creating AI systems capable of generating interactive 3D environments from texts and images.

The details:

  • The company is building Spatial Foundation Models (SFMs) that understand 3D space natively and can grasp geometry, physics, and material properties.

  • SpAItial’s founding team includes former leaders from Synthesia, Google, and Meta, bringing expertise in 3D AI and neural rendering technologies.

  • Early generated photorealistic 3D rooms from simple text prompts, with applications spanning gaming, construction, VR, and robotics.

Why it matters: While AI has mastered generating 2D images and videos, creating coherent, spatially aware 3D worlds remains a challenge. This new breed of models could enable anyone to create complex virtual environments with just a few words — tackling what many consider to be the next frontier in AI.

AI TRAINING

📊 

The Rundown: In this tutorial, you will learn how to create an automated system with Zapier Agents that can turn meeting recordings into transcripts, summaries, and actionable task lists in Google Docs.

Step-by-step:

  1. Visit and create a “New Agent”

  2. Configure your agent to trigger when new audio files are uploaded to a specified folder in Google Drive

  3. Add three essential tools: ChatGPT to transcribe the audio, ChatGPT again to summarize and extract action points, and Google Docs to compile everything into a single document

  4. Test your setup with a sample recording and activate your agent

Pro tip: At the start of each meeting, ask participants to clearly state their names before speaking and explicitly mention action item assignments to help the AI more accurately attribute tasks to team members.

PRESENTED BY ENCORD

📊 

The Rundown: Encord is a consolidated platform for multimodal AI data management, curation, and annotation, enabling teams to accelerate model iteration cycles with balanced, accurately labeled datasets.

Leading AI teams use Encord’s fully customizable multimodal interface to:

  • Evaluate GenAI outputs across video, audio, and text in record time

  • Create VLA datasets with synchronized video, instruction, and trajectory data

  • Unite PDF, image, video, audio, and DICOM labeling in a single interface

.

AI RESEARCH

☺️

Image source: UC Berkeley and Yale

The Rundown: Researchers from UC Berkeley and Yale INTUITOR, an AI training method that enables language models to improve their reasoning using internal confidence signals — eliminating the need for correct answers or external feedback.

The details:

  • INTUITOR measures how confident an AI feels about each word it generates, using this “gut feeling” as a guide for learning.

  • Instead of needing correct answers to learn (like traditional AI training), the system rewards the AI when it produces responses it feels confident about.

  • When tested on math problems, the method performed just as well as conventional training, but showed even better results on programming tasks.

  • The AIs also began showing human-like reasoning behaviors — breaking down complex problems, planning, and explaining their thinking step-by-step.

Why it matters: Just as intuition and confidence play a large role in human learning, this study shows AI is succeeding within the same system. This self-directed approach could be especially valuable for tasks where there’s no clear “right answer” or where human expertise is limited, allowing AI to venture into unexplored knowledge areas.

QUICK HITS

🛠️

  • ⚙️ – Anthropic’s agentic coding tool, now generally available

  • 🧠  – Nvidia’s math and code reasoning model

  • 🦙  – Fine-tune and train open-source LLMs with no code

  • ▶️ – One-click AI thumbnail generator

💼 

  • 🎧 – Software Engineering Manager, Audio

  • 🛠️ – Systems Engineer

  • 🕴️ – Executive Recruiter

  • 🤝  – Partner Success Manager

📰 

Mistral Agents API for enterprise apps, introducing connectors for coding, web search, and image generation alongside memory and multi-agent orchestration.

Meta is reportedly its AI organization into two distinct teams focused on AI products and AGI foundations, aiming to accelerate the company’s development.

Anthropic’s Claude 4 Sonnet model a new SOTA on the ARC-AGI-2 benchmark, surpassing o3 for the top spot on the leaderboard.

Google DeepMind SignGemma, an upcoming model capable of translating sign language into text.

Salesforce cloud data management firm Informatica for $8B, strengthening the infrastructure powering its agent-based products and platforms.

The Browser Company that it will no longer be working on its Arc browser, instead fully pivoting to developing its AI-first Dia browser as a separate product.

COMMUNITY

🎥 

Join our next workshop this Friday, May 30th, at 4 PM EST with Dr. Alvaro Cintas, The Rundown’s AI professor. By the end of the workshop, you’ll confidently be able to use AI coding agents to improve your development workflow.

RSVP . Not a member? Join on a 14-day free trial.

🤝 

We’ll always keep this newsletter 100% free. To support our work, consider with your friends, and we’ll send you more free goodies.

See you soon,

Rowan, Joey, Zach, Alvaro, and Jason—The Rundown’s editorial team

More from this stream

Recomended