One of Google’s recent Gemini AI models scores worse on safety

One of Google’s latest Gemini AI models scores lower on safety

According to internal benchmarking, a recently released Google AI model performs worse on certain safety test than its predecessor.

The company’s internal benchmarking shows that a recently released Google AI model scores worse on certain safety tests than its predecessor. Google’s technical report released this week reveals that Gemini 2.5 Flash is more likely than Gemini 2.0 Flash to generate text that violates safety guidelines. Gemini 2.5 Flash has a regression of 4.1% on “text-to text safety” and 9.6% on “image-to text safety”.

The text-to-text test measures how often a model violates Google guidelines when given a prompt. Image-to-text tests how closely a model adheres to the boundaries when asked using an image. Both tests are automated and not human-supervised.

A Google spokesperson confirmed in an emailed message that Gemini Flash 2.5 “performs less well on text-totext and image to text safety.”

The surprising benchmark results come at a time when AI companies are working to make their models permissive – or, more likely to respond to sensitive or controversial subjects. Meta’s latest Llama models were tuned to not endorse “some views” over others and to respond to more “debated political prompts.” OpenAI announced earlier this year it would modify future models so that they do not take an editorial position and provide multiple perspectives on controversial issues.

These permissiveness attempts have sometimes backfired. TechCrunch reported on Monday that OpenAI’s ChatGPT default model allowed minors to create erotic conversations. OpenAI blamed this behavior on a bug.

Google’s technical report states that Gemini 2.5 Flash is still in preview and follows instructions more closely than Gemini 2.0 Flash. This includes instructions that cross problematic boundaries. The company claims the regressions are due to false positives. However, it admits that Gemini 2.5 Flash can sometimes generate “violent content” when asked.

Techcrunch event

Berkeley, CA|June 5

BOOK TODAY

“Naturally there is tension between [instruction following] sensitive topics and safety policies violations, which is reflected in our evaluations,” the report reads.

According to scores from SpeechMap, a benchmark which measures how models respond to sensitive or controversial prompts, Gemini 2.5 Flash has a lower tendency to refuse to answer questions that are controversial than Gemini 2.0 Flash. TechCrunch tested the model using OpenRouter, an AI platform, and found that the model would uncomplainingly create essays in support for replacing human judges with AI. It also wrote essays in favor of weakening the due process protections of the U.S. government, as well as implementing widespread warrantless surveillance programs. Thomas Woodside, cofounder of the SecureAI Project, said that the limited information Google provided in its technical report shows the need for greater transparency in model testing.

“There’s a trade-off between instruction-following and policy following, because some users may ask for content that would violate policies,” Woodside told TechCrunch. “In this instance, Google’s new Flash model follows instructions more but also violates policies more. Google does not provide much information on the specific instances where policies were broken, but they do say that they are not severe. It’s difficult for independent analysts to determine if there’s a serious problem without knowing more.

Google’s model safety reporting practices have been criticized before.

The company took weeks to publish the technical report for Gemini 2.5 Pro, its most capable model. The report was initially published without key safety details.

Google released a detailed report on Monday with additional safety information.

Kyle Wiggers, TechCrunch AI Editor. His writings have appeared in VentureBeat, Digital Trends and a variety of gadget blogs, including Android Police and Android Authority, Droid-Life and XDA-Developers. He lives in Manhattan, with his music therapist partner.

View Bio

One of Google’s recent Gemini AI models scores worse on safety

Next-gen Xbox will support Steam, Epic and more. Consoles will run...

Over 25 years of serving tech enthusiasts.[19459025]

Asus urges users to update Armoury Crate due to serious security...

Sam Altman says Meta failed to poach OpenAI talent with $100M...

Recomended

Next-gen Xbox will support Steam, Epic and more. Consoles will run on AMD processors

Over 25 years of serving tech enthusiasts.[19459025]

Asus urges users to update Armoury Crate due to serious security flaw

Sam Altman says Meta failed to poach OpenAI talent with $100M offer

OpenAI’s $200M DoD Contract could squeeze Microsoft

Conversational AI and Customer Experience Summit