Google DeepMind has released a new model, Gemini Robotics, that combines its best large language model with robotics. Plugging in the LLM seems to give robots the ability to be more dexterous, work from natural-language commands, and generalize across tasks. All three are things that robots have struggled to do until now.
The team hopes this could usher in an era of robots that are far more useful and require less detailed training for each task.
“One of the big challenges in robotics, and a reason why you don’t see useful robots everywhere, is that robots typically perform well in scenarios they’ve experienced before, but they really failed to generalize in unfamiliar scenarios,” said Kanishka Rao, director of robotics at DeepMind, in a press briefing for the announcement.
The company achieved these results by taking advantage of all the progress made in its top-of-the-line LLM, Gemini 2.0. Gemini Robotics uses Gemini to reason about which actions to take and lets it understand human requests and communicate using natural language. The model is also able to generalize across many different robot types.
Incorporating LLMs into robotics is part of a growing trend, and this may be the most impressive example yet. “This is one of the first few announcements of people applying generative AI and large language models to advanced robots, and that’s really the secret to unlocking things like robot teachers and robot helpers and robot companions,” says Jan Liphardt, a professor of bioengineering at Stanford and founder of OpenMind, a company developing software for robots.
Google DeepMind also announced that it is partnering with a number of robotics companies, like Agility Robotics and Boston Dynamics, to continue refining the model. “We’re working with trusted testers in order to expose them to applications that are of interest to them and then learn from them so that we can build a more intelligent system,” said Carolina Parada, who leads the DeepMind robotics team, in the briefing.
Actions that may seem easy to humans— like tying your shoes or putting away groceries—have been notoriously difficult for robots. But plugging Gemini into the process seems to make it far easier for robots to understand and then carry out complex...