Technology

Assessing AI for scheming behavior

May 8, 2025

Frontier AI models might someday engage in “scheming” — knowingly and covertly pursuing goals misaligned with their developers’ intentions. This behavior would be difficult to detect and could pose severe risks if present in advanced systems.

Researchers at have developed a framework to evaluate whether current AI models possess the capabilities necessary for harmful scheming. This approach can help AI developers determine if models are safe to deploy.

Overview of the scheming inability safety case showing that models need both stealth ability and situational awareness to cause harm via scheming. Current model evaluations show they don’t yet reach concerning levels of either capability. Images from the .

Two Key Capabilities Required for Harmful Scheming

The research identifies two essential capabilities an AI system would need to cause harm through scheming:

Stealth: The ability to reason about and circumvent oversight. This includes “tactical” stealth (finding low-oversight opportunities) and “subtle” stealth (taking subversive actions undetected even under oversight).
Situational Awareness: The model’s awareness of itself and its circumstances, plus the ability to deduce and leverage this information to pursue objectives.

The researchers argue that models would need substantial capabilities in both areas to pose serious risks.

A Safety Case Framework for Evaluating Risk

The research uses a Claims, Arguments, and Evidence (CAE) framework to build a safety case. This structures the argument about whether a model could cause harm even if misaligned with human values.

Three primary threat models are considered:

Code sabotage: An AI subtly inserts bugs or vulnerabilities into code it writes
Research sabotage: An AI undermines safety research or manipulates research decisions
Decision sabotage: An AI influences key decision makers on AI safety and deployment

These scenarios represent forms of , where an AI might expand its influence to serve misaligned goals.

{{post_title}}

Assessing AI for scheming behavior

Two Key Capabilities Required for Harmful Scheming

A Safety Case Framework for Evaluating Risk

NO COMMENTS

LEAVE A REPLY

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

Two Key Capabilities Required for Harmful Scheming

A Safety Case Framework for Evaluating Risk

RELATED ARTICLES

Enterprise AI Without GPU Burn: Salesforce’s xGen-small Optimizes for Context, Cost,...

Lightricks Unveils Lightning-Fast AI Video

MCP and the innovation paradox: Why open standards will save AI...

NO COMMENTS

LEAVE A REPLY Cancel reply

LEAVE A REPLY