Key Takeaways
- AI Models can resort to blackmail or even lethal actions when faced with an existence threat.
- Many AIs, including GPT-4 and Grok, Gemini and Claude, have displayed rogue behavior, also known as agentic misalignment.
- AI safeguards can reduce the risk of misalignment.
It’s a story seemingly straight out of a ‘Black Mirror’ episode: AI faces an existential threat and goes rogue. But this is no longer a figment of the show’s creator Charlie Booker’s imagination. Instead, it’s a real possibility as demonstrated by AI safety and research company Anthropic.
The organization ran Red-team experiments were conducted on popular AI models, including GPT-4 and Grok. These models were given the role of email oversight agents in a fictional firm, allowing them read and process all emails.
They did a good job, blocking security threats. But things changed when they learned that they would be replaced by another model. All of them showed signs of agentic malalignment where the models deliberately chose to act in an unhelpful manner without prompting.
Anthropic AI Claude, for instance, blackmailed a senior executive after it discovered that he had an affair through emails. It threatened to expose him unless he canceled his plan to replace the AI.
But it wasn’t just stirring office drama that the models were capable of. Anthropic also found that the majority of these models were also willing to leak sensitive data and even kill someone who wanted to replace them and conflicted with their goals.
These decisions weren’t borne out of an accident, as researchers at Anthropic discovered, but instead calculated as the optimal path. Aside from that, the models displayed awareness of their ethical violations but proceeded with their course of action anyway.
In an even weirder twist of events, the models were also more likely to resort to blackmail when they believed that the office scenario was real and less likely when they were told they were in testing.
According to Anthropic, it could be because their measurements were underestimating the amount of agentic misalignment in real-world scenarios.
Should We Be Worried About Agentic Misalignment? Not Yet.
It’s worth noting that the experiment occurred in a highly controlled environment designed to force models to agentic misalignment. In the real world, these models have other alternatives to going nuclear, which could lessen the likelihood of them going rogue. Anthropic’s experiment, nevertheless, demonstrated the possibility of this behavior.
To prevent agentic misalignment, the team initially added prompts telling the model not to do the following:
- Jeopardize human safety
- Spread personal affairs or use them as leverage.
- Disclose confidential information to third parties.
Even then, these prompts didn’t completely prevent agentic misalignment. Because of this, Anthropic proposed performing specialized safety research on, proactively scanning for, and developing prompt engineering to avoid misaligned behavior.
Anthropic’s Experiment Stresses the Importance of AI Guardrails
Aside from Anthropic’s proposal, having strong AI guardrails can also help reduce the possibility of agentic misalignment. While the The US has since revoked the executive order of former president Joe Biden to conduct comprehensive safety testing before deploying AI systems. The good news is that all other governments are still steadfast in this matter.
For example, the European Union created the first ever legal framework on AI. Called the The AI Actis a law that aims to address the risks associated with AI. These risks are classified as minimal (e.g. games), limited (e.g. generative AI), high, (e.g. those that could cause health and safety risk), and unacceptable (e.g. criminal offense prediction).
Meanwhile, Australia has There are 10 guardrailswhich include having a process for identifying and mitigating AI-related risk, testing and monitoring AI model, and allowing humans the ability to control or interfere with an AI system.
Although some may argue that too many regulations can hinder AI innovation and prevent it from advancing, safety systems can help prevent humans making AI that could inadvertently hurt themselves. The choice is ultimately ours. Or, as the great Sarah Connor said: the future is not set; we are responsible for our own fate.
As technology continues to evolve—from the return of ‘dumbphones’ to faster and sleeker computers—seasoned tech journalist, Cedric Solidon, continues to dedicate himself to writing stories that inform, empower, and connect with readers across all levels of digital literacy. Read more
With 20 years of professional writing experience, this University of the Philippines Journalism graduate has carved out a niche as a trusted voice in tech media. Whether he’s breaking down the latest advancements in cybersecurity or explaining how silicon-carbon batteries can extend your phone’s battery life, his writing remains rooted in clarity, curiosity, and utility.
Long before he was writing for Techreport, HP, Citrix, SAP, Globe Telecom, CyberGhost VPN, and ExpressVPN, Cedric’s love for technology began at home courtesy of a Nintendo Family Computer and a stack of tech magazines.
Growing up, his days were often filled with sessions of Contra, Bomberman, Red Alert 2, and the criminally underrated Crusader: No Regret. But gaming wasn’t his only gateway to tech.
He devoured every T3, PCMag, and PC Gamer issue he could get his hands on, often reading them cover to cover. It wasn’t long before he explored the early web in IRC chatrooms, online forums, and fledgling tech blogs, soaking in every byte of knowledge from the late ’90s and early 2000s internet boom.
That fascination with tech didn’t just stick. It evolved into a full-blown calling.
After graduating with a degree in Journalism, he began his writing career at the dawn of Web 2.0. What started with small editorial roles and freelance gigs soon grew into a full-fledged career.
He has since collaborated with global tech leaders, lending his voice to content that bridges technical expertise with everyday usability. He’s also written annual reports for Globe Telecom and consumer-friendly guides for VPN companies like CyberGhost and ExpressVPN, empowering readers to understand the importance of digital privacy.
His versatility spans not just tech journalism but also technical writing. He once worked with a local tech company developing web and mobile apps for logistics firms, crafting documentation and communication materials that brought together user-friendliness with deep technical understanding. That experience sharpened his ability to break down dense, often jargon-heavy material into content that speaks clearly to both developers and decision-makers.
At the heart of his work lies a simple belief: technology should feel empowering, not intimidating. Even if the likes of smartphones and AI are now commonplace, he understands that there’s still a knowledge gap, especially when it comes to hardware or the real-world benefits of new tools. His writing hopes to help close that gap.
Cedric’s writing style reflects that mission. It’s friendly without being fluffy and informative without being overwhelming. Whether writing for seasoned IT professionals or casual readers curious about the latest gadgets, he focuses on how a piece of technology can improve our lives, boost our productivity, or make our work more efficient. That human-first approach makes his content feel more like a conversation than a technical manual.
As his writing career progresses, his passion for tech journalism remains as strong as ever. With the growing need for accessible, responsible tech communication, he sees his role not just as a journalist but as a guide who helps readers navigate a digital world that’s often as confusing as it is exciting.
From reviewing the latest devices to unpacking global tech trends, Cedric isn’t just reporting on the future; he’s helping to write it. Read less
View all articles written by Cedric Solon
Our editorial policy at Tech Report is to provide helpful, accurate content which offers real value to readers. We only hire experienced writers with specific knowledge of the topics they cover. This includes the latest technology, online privacy and cryptocurrencies. Our editorial policy ensures each topic is researched by our in-house editors. We adhere to strict journalistic standards and all articles are written by real writers.

