Home Technology Open-Source Tools Google’s Gemini panicked while playing Pokemon

Google’s Gemini panicked while playing Pokemon

0
Google’s Gemini panicked while playing Pokemon

The AI industry is battling to dominate, but they are also fighting in Pokemon gyms.

Google and Anthropic are both studying how their latest AI models play early Pokemon games. The results can be as entertaining as they are enlightening – and this time, Google DeepMind is the winner. In a report it is stated that Gemini 2.5 Pro panics when its Pokemon are near death. The report states that this can lead to a “qualitatively noticeable degradation” in the AI’s reasoning ability.

AI Benchmarking, or the process of comparing different AI models’ performance, is a dubious science that often gives little context to the actual capabilities of the model. Some researchers believe that studying how AI plays video games can be useful (or at least funny).

In the past few months, two developers who are not affiliated with Google or Anthropic set up Twitch channels called ” Gemini Plays Pokemon (19459026)” and ” Claude Plays Pokemon (19459026]is a video game that was created for children over 25 years ago. Anyone can watch the AI navigate it in real time.

Each streaming displays the AI’s reasoning process – or, a translation into natural language of how the AI evaluates and responds to a problem – giving us insights into the way these models work.

Image credits:Google.

Although the progress of these AI-models is impressive, they still are not very good at Pokemon. Gemini takes hundreds of years to complete a game a child can finish in a fraction of the time.

It’s not the time it takes to complete a Pokemon game that is interesting, but how an AI behaves on its journey. The report states that “Gemini 2.5 Pro encounters various situations during the playthrough which cause the model’s simulation of panic.”

The model’s performance can be affected by this “panic,” as the AI might suddenly stop using some of its tools for a period of time. While AI cannot think or feel emotions, its actions mimic how a person might make poor decisions under stress. It’s a fascinating but unsettling response.

The report states that “this behavior has been observed in enough instances to be noticed by the Twitch chat members.”

Claude also displayed some strange behaviors on its journeys through Kanto. In one instance the AI noticed that the player character would “white out” when all of his Pokemon ran out of health and return to the Pokemon Center.

Claude was stuck in the Mt. Moon cave, it incorrectly hypothesized that it would be transported to the Pokemon Center of the next town if it deliberately made all of its Pokemon faint.

But that’s not how the game works. You return to the Pokemon Center that you visited most recently when all your Pokemon die. Viewers watched in horror as the AI tried to kill itself.

There are some ways that the AI can perform better than human players, despite its shortcomings. The AI can solve puzzles with impressive precision since the release of Gemini 2.5 Pro.

With the help of humans, the AI developed agentic tools – prompted instances Gemini 2.5 Pro that are geared towards specific tasks – to solve the boulder puzzles in the game and find efficient routes for reaching a destination. The report states that Gemini 2.5 Pro can solve some of the complex boulder puzzles required to progress in Victory Road with just a prompt explaining boulder physics and how to verify a valid route.

Google believes that since Gemini 2.5 Pro created these tools largely on its own, the current model could be able to create them without human intervention. Who knows, Gemini may retrain itself to create a “don’t panic” module.

Amanda Silberling, senior writer for TechCrunch, covers the intersection of culture and technology. She has written for publications such as Polygon, MTV and the Kenyon Review. She is co-hosting Wow If True, an internet culture podcast, with science fiction writer Isabel J. Kim. She worked as a museum educator, film festival coordinator, and grassroots organizer before joining TechCrunch. She holds a B.A. She holds a B.A.

Send tip through Signal, a messaging app with encryption, to (929) 593 0227. For anything else, email amanda@techcrunch.com.

View Bio

www.aiobserver.co

NO COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Exit mobile version