(19659001) Google DeepMind today (19659002) Gemini Robotics will bring Gemini and AI into the physical world, with new models that can “perform a wide range of real-world activities than ever before.”
The goal is to build general-purpose robots with CEO Sundar Piichai. adding how Google has “always thought of robotics as a helpful testing ground for translating AI advances into the physical world.”
“Gemini Robotics” is a vision-language-action (VLA) model built on Gemini 2.0 “with the addition of physical actions as a new output modality for the purpose of directly controlling robots.”
Going in, Google has “three principal qualities” for robotic AI models:
Generality: “able to adapt to different situations”
- Gemini Robotics is “adept at dealing with new objects, diverse instructions, and new environments,” including “tasks it has never seen before in training” by leveraging Gemini’s underlying world understanding.
Dexterity : “can do things that people can do with their fingers and hands, like carefully manipulating objects.”
-
Google Announced The Gemini Robotics -er”embodied reason”) vision-language with enhanced spatial “understanding the world in ways needed for robotics” and allows roboticists connect it with existing low-level controllers.
When shown a coffee mug the model can intuit a two-finger grip for picking it up by its handle and a safe path for approaching it. These models are tested on a variety of robot form factors, including bi-arms and humanoid robotics, by trusted testers such as Agile Robots.