The Time Sam Altman Requested a Countersurveillance audit of OpenAI

Dario Amodei’s AI safety group

was becoming increasingly concerned about some of Sam Altman’s behaviors. Several of them were shocked to learn how many promises Altman made to Microsoft in exchange for its investment. The terms of the agreement did not match what they understood from Altman. They worried that if AI safety issues were actually found in OpenAI’s model, these commitments would make preventing the deployment of those models much more difficult, if they weren’t impossible. Amodei’s group began to doubt Altman’s honesty.

A member of the group said, “We are all pragmatic people.” “We’re raising money, we’re doing commercial stuff.” It may seem reasonable to someone who does a lot of trading, like Sam, to say, “Let’s make a trade, let’s trade something, we’re going trade the next thing.” But to someone like me, it feels like we’re committing ourselves to an uncomfortable situation.

It was set against a backdrop of growing paranoia about different issues within the company. The AI safety group was concerned about what they perceived as increasing evidence that powerful systems misaligned could have disastrous outcomes. A bizarre experience had made several of them nervous. In 2019, a group had started the AI safety work Amodei wanted. They had used a model that was trained after GPT-2, but with roughly twice as many parameters. They tested reinforcement learning from human input (RLHF) to guide the model towards cheerful and positive content while avoiding anything offensive.

One late night, a researcher updated his code with a typo before letting the RLHF run overnight. This typo was important: it was a minus symbol that was flipped into a plus sign, which made the RLHF work in reverse. It pushed GPT-2 to create a lot more offensive content instead of less. The typo had already caused havoc the next day, and GPT-2 began completing all prompts with sexually explicit and lewd language. It was funny–and concerning. The researcher identified the error and pushed a fix into OpenAI’s codebase with the comment: Let’s avoid making a utility minimiser.

Many employees were also concerned about what would happen if other companies learned OpenAI’s secret. They would tell each other that the secret to how their stuff worked could be written on one grain of rice, referring to scaling. They were also concerned about the possibility of powerful capabilities falling into the hands or bad actors. The leadership tapped into this fear by frequently mentioning the threat of China and Russia and North Korea, and stressing the need for AGI to remain in the hands of an American organization. This annoyed some employees who weren’t American. During lunch, they would ask, Why was it a US-based organization? Remembers a former worker. Why not from Europe? Why and not from China?

In these heady discussions about the long-term implications for AI research, many employees referred often to Altman’s early analogies between OpenAI’s and the Manhattan Project. OpenAI was it really building a nuclear weapon equivalent? It was a strange contrast from the plucky and idealistic culture that it had created as a largely educational organization. After a long work week, employees would relax with music and wine on Fridays. They enjoyed the soothing sounds of a rotating group of colleagues playing the piano in the office late into the evening.

Some people were unnerved by the shift in gravity, which increased their anxiety over random and unrelated events. A journalist once tailgated a person inside the gated lot to gain entry to the building. A second time, an employee discovered a USB stick that was not accounted for, which caused concern about whether the USB contained malware, a common vector of attacks, and was an attempt at a cyber-breach. The USB was found to be nothing after it was examined using a computer that had been completely disconnected from the internet. Amodei used an air-gapped machine to write important strategy documents at least twice. The machine was connected directly to a printout to ensure that only physical copies were distributed. He was paranoid that state actors would steal OpenAI’s secret and build their own powerful AI model for malicious purposes. One employee recalls that “no one was prepared for this level of responsibility.” It kept people awake at night.

Altman was paranoid that people would leak information. He was privately concerned about the staff of Neuralink, with whom OpenAI shared an office. His concern grew after Elon Musk left. Altman was also concerned about Musk’s extensive security apparatus, which included personal drivers and bodyguards. Altman, who was well aware of the difference in capability, secretly ordered an electronic countersurveillance review to search for any bugs Musk might have left behind to spy on OpenAI. Altman, who was aware of the capability difference, secretly ordered an electronic countersurveillance audit to scan the office for any bugs that Musk may have left to spy on OpenAI. In his vision document, Altman wrote: “We must be responsible for a positive outcome for the entire world.” “On the contrary, if a government builds AGI and misuses it before we do, we will also have failed at our missions–we must almost certainly make rapid technological progress in order for us to succeed in our mission.”

Karen Hao writes in the author’s notes at the beginning of her book that she “reached out to all the key figures and businesses described in this book in order to seek interviews and comments.” Hao tried to contact Elon Musk, but he did not respond.


Excerpt fromEmpire of AI : Dreams and Nightmares In Sam Altman’s OpenAI,and written by Karen Hao. Published by arrangement with Penguin Press. Copyright (c). 2025 by Karen Hao.

www.aiobserver.co

More from this stream

Recomended