Google has not been the first to combine large language models and robots. The trend is huge.
On Wednesday, Google made an announcement that was a bit surprising. It launched a version, Gemini, of its AI model that can do things in the digital world of chatbots and web search, but also in the physical realm, via robots.
Gemini Robotics fuses large language models and spatial reasoning to allow you to tell a robot arm to do things like “put grapes in the glass bowl”. These commands are filtered by the LLM which identifies your intentions from what you say and then breaks them into commands that the robotic arm can carry out. Scott Mulligan has written a full account of how this all works.
This might have you wondering if your home or office will one day be filled by robots that you can give orders to. Soon, more on that.
First, where is this from? Google hasn’t made much of a splash in the robotics world so far. Alphabet has acquired a few robotics startups in the past decade. But, in 2023, it shut down a unit that was working on robots for practical tasks such as cleaning up trash.
Despite this, the company’s decision to bring AI to the physical world through robots follows the exact precedent that other companies have set in the past two year (something which, I must humbly mention, MIT Technology Review had long anticipated).
In summary, two trends are converging in opposite directions: Robotics firms are increasingly leveraging AI and AI giants are building robots. OpenAI, which shut down its robotics team by 2021, has launched a new effort to build humanoid-like robots in this year. Nvidia, the chip giant, declared in October that the next wave of AI would be “physical AI.” Google’s use of large language models for instructions is particularly interesting.
This is not the first. A video of a humanoid being taught to put away dishes by humans was the reason that the robotics startup Figure went viral a year earlier. Around the same period, Covariant, a startup spun-off from OpenAI, created something similar for robotic arm in warehouses. I saw a demonstration where you could instruct the robot via images, texts, or videos to do things such as “move tennis balls from this to that bin.” Covariant acquired by Amazon only five months later.
You can’t help wondering when these robots will be in our workplaces after seeing such demos. What about our homes, then? Figure’s plans give a hint that the answer to your first question will be soon. The company announced on Saturday that it was building a high volume manufacturing facility to produce 12,000 humanoid robotics per year. It takes time to train and test robots, particularly in areas where they will be working near humans.
Figure’s rival Agility robotics, for example, claims to be the only company in America with paying customers who use its humanoids. The industry’s safety standards for humanoids that work alongside people have not yet been fully developed, so the company has to keep its robots in separate areas.
Despite recent progress, the home will remain the last frontier. Our homes are chaotic, and unpredictable compared to factory floors. Everyone is crammed in close quarters. Even impressive AI models such as Gemini Robotics, will still have to undergo many tests in both the real world and simulation, just like autonomous cars. This testing could take place in hotels, hospitals, or warehouses where robots might still be assisted by remote operators. It will be a long time until they are allowed to put away our dishes. This story was originally published in The Algorithm – our weekly AI newsletter. Sign up to receive stories like this first in your inbox by clicking here .