OpenAI’s quest for AI to do anything for you
Hunter Lightman, who joined OpenAI in 2022 as a researcher, watched his colleagues launch ChatGPT – one of the fastest growing products ever. Lightman, meanwhile, quietly worked with a team to teach OpenAI’s math models in order to solve high-school math competitions.
The team, MathGen, today is considered a key part of OpenAI’s industry leading effort to create AI reasoning model: the core technology that allows AI agents to perform tasks on a computer as a human.
Lightman, who led MathGen’s early efforts, told TechCrunch that they were working to improve the models’ mathematical reasoning skills, which at the moment, weren’t great.
OpenAI models are not perfect today – the company’s newest AI systems still hallucinate, and its agents struggle with complex tasks.
However, its state-of the-art models have significantly improved on mathematical reasoning. OpenAI’s model won a gold at the International Mathematical Olympiad, a math competition held for the brightest high-school students in the world. OpenAI believes that these reasoning abilities will be able to translate into other subjects and eventually power the general-purpose agents, which the company has always aspired to build.
ChatGPT, a low-key preview of research that became a viral consumer product, was a happy mistake. But OpenAI’s agents were the result of a deliberate and years-long effort within the company.
OpenAI CEO Sam Altman said at the company’s developer conference in 2023 that you will eventually be able to ask the computer what you need, and it will do the task for you. These capabilities are often referred to in the AI field by the term agents.October 27-29, 2025
Whether the agents will live up to Altman’s vision is yet to be seen. However, OpenAI shocked the entire world when it released its first AI reasoning system, o1, during the fall of 2024. In less than a calendar year, the 21 researchers who made that breakthrough are now the most sought-after talent on Silicon Valley.
Mark Zuckerberg recruited five of the o1 researchers to work on Meta’s new superintelligence-focused unit, offering some compensation packages north of $100 million. Shengjia Zhou, one of the five, was recently appointed chief scientist of Meta Superintelligence Labs.
The renaissance of reinforcement learning
OpenAI’s reasoning agents and models are based on a machine-learning training technique called reinforcement learning (RL). RL gives feedback to an AI on whether the model’s choices were correct or not in simulated environments.
RL is used since decades. In 2016, about a month after OpenAI’s founding in 2015, AlphaGo, an AI created by Google DeepMind, using RL, gained worldwide attention after it beat a world champion at the board game Go.
Around that time, one of OpenAI’s first employees, Andrej Karpathy, began pondering how to leverage RL to create an AI agent that could use a computer. But it would take years for OpenAI to develop the necessary models and training techniques.
By 2018, OpenAI pioneered its first large language model in the GPT series, pretrained on massive amounts of internet data and large clusters of GPUs. GPT models excelled at text processing, eventually leading to ChatGPT, but struggled with basic math.
It took until 2023 for OpenAI to achieve a breakthrough, initially dubbed “Q*” and then “Strawberry,” by combining LLMs, RL, and a technique called “test-time computation.” The latter gave the models extra time and computing power to plan and work through problems, verifying its steps, before providing an answer.
This allowed OpenAI to introduce a new approach called “chain-of-thought” (CoT), which improved AI’s performance on math questions the models hadn’t seen before.
“I could see the model starting to reason,” said OpenAI researcher Ahmed El-Kishky. “It would notice mistakes and backtrack, it would get frustrated. It really felt like reading the thoughts of a person.”
Though individually these techniques weren’t novel, OpenAI uniquely combined them to create Strawberry, which directly led to the development of o1. OpenAI quickly identified that the planning and fact-checking abilities of AI reasoning models could be useful to power AI agents.
“We had solved a problem that I had been banging my head against for a couple of years,” said Lightman. “It was one of the most exciting moments of my research career.”
Scaling reasoning
OpenAI identified two new axes to improve AI models with AI reasoning models: using more computing power during post-training AI models and giving AI models additional time and processing power when answering a query.
Lightman said that OpenAI, as a business, is concerned with the future of the company and the way it will scale.
Two sources told TechCrunch that, shortly after the 2023 Strawberry breakthrough was made, OpenAI formed an “Agents team” led by OpenAI researcher Daniel Selsam in order to further develop this new paradigm. OpenAI did not initially distinguish between reasoning models and agents, despite the name of the team. The company wanted AI systems that could complete complex tasks.
The work of Selsam’s Agents became part of a bigger project to develop the O1 reasoning model. Leaders included OpenAI cofounder Ilya Sukseker, chief research officer Mark Chen and chief scientist Jakub Pachocki.
OpenAI would have to divert precious resources — mainly talent and GPUs — to create o1. Throughout OpenAI’s history, researchers have had to negotiate with company leaders to obtain resources; demonstrating breakthroughs was a surefire way to secure them.
“One of the core components of OpenAI is that everything in research is bottom up,” said Lightman. “When we showed the evidence [for o1]the company was like, ‘This makes sense, let’s push on it.’”
Some former employees say that the startup’s mission to develop AGI was the key factor in achieving breakthroughs around AI reasoning models. By focusing on developing the smartest-possible AI models, rather than products, OpenAI was able to prioritize o1 above other efforts. That type of large investment in ideas wasn’t always possible at competing AI labs.
The decision to try new training methods proved prescient. By late 2024, several leading AI labs started seeing diminishing returns on models created through traditional pretraining scaling. Today, much of the AI field’s momentum comes from advances in reasoning models.
What does it mean when an AI “reason“?
The goal of AI research in many ways is to recreate human intelligence using computers. Since the launch o1, ChatGPT has added more human-sounding features, such as “thinking,” “reasoning,” and “planning.”
El-Kishky reacted cautiously when asked whether OpenAI models are truly reasoning. He said he views the concept from the perspective of computer science.
We’re teaching the model to efficiently use compute to get an accurate answer. El-Kishky said that if you define reasoning in this way, then yes, it is.
Lightman focuses on the results of the model and less on the means, or their relationship to human brains.
“If the model is doing hard things, then it is doing whatever necessary approximation of reasoning it needs in order to do that,” said Lightman. “We can call it reasoning, because it looks like these reasoning traces, but it’s all just a proxy for trying to make AI tools that are really powerful and useful to a lot of people.”
OpenAI’s researchers note people may disagree with their nomenclature or definitions of reasoning — and surely, Critics have appeared – but they argue that it’s less significant than the capabilities of their model. Other AI researchers tend agree. Nathan Lambert, a researcher at the non-profit AI2, compares AI’s reasoning modes to airplanes. blog post. He says that both are man-made systems, inspired by nature – human reasoning and bird flights, respectively – but they work through completely different mechanisms. It doesn’t mean they are any less useful or capable of achieving similar results.
In a recent paper, a group of AI researchers from OpenAI Anthropic and Google DeepMind all agreed that AI reasoning models were not well understood and that more research was needed. It may be premature to make a confident claim about what is happening inside.
The next frontier is AI agents for subjective tasks.
AI agents are currently best suited for well-defined and verifiable domains, such as coding. OpenAI’s Codex Agent aims to help software developers offload simple coding duties. Anthropic models are also popular in AI coding software like Cursor or Claude Code – these are the first AI agents people are willing pay for.
But general-purpose AI agents such as OpenAI’s chatGPT agent and Perplexity Comet struggle to automate many of the subjective, complex tasks that people want. I have found that these tools take longer to use and make silly errors when I try to use them for online shopping, or finding a parking spot long-term.
Agents, of course are early systems that will improve. Researchers must first figure how to better train underlying models to perform tasks that are subjective.
When asked about the limitations agents face on subjective tasks, Lightman said that it was a data issue. “Some of the research that I’m excited about is figuring how to train on tasks that are less verifiable. We have some ideas on how to accomplish these tasks.
Noam B., an OpenAI researcher and o1 creator who worked on the IMO model, told TechCrunch about OpenAI’s new general-purpose RL technique that allows them to teach AI models abilities that aren’t easily verifiable. He said that this was how OpenAI built the model which won a gold at IMO.
OpenAI’s IMO model is a newer AI that spawns agents who simultaneously explore multiple ideas before selecting the best answer. Google and xAI recently released state of the art models using this technique.
Brown said, “I believe these models will become better at math and I think they’ll also get better in other reasoning areas.” The progress has been incredible fast. I don’t think it will slow down.
The techniques could help OpenAI models become more efficient, and these g ains could be seen in the company’s GPT-5 model. OpenAI wants to establish its dominance with GPT-5. It hopes to offer the best AI model for developers and consumers.
The company also wants to simplify its products. El-Kishky said OpenAI wants AI agents that intuitively know what users want without requiring specific settings. He says OpenAI wants to build AI systems which know when to use certain tools and how long to think.
The ideas painted here paint a picture for an ultimate version ChatGPT, an agent who can do everything on the internet and understands how you want it done. This is a very different product from what ChatGPT currently is, but the research of the company is clearly heading in this direction.
OpenAI led the AI industry in the past, but now it faces a number of worthy competitors. OpenAI’s agentic future is not just a question of whether it can be delivered, but also if they can do it before Google, Anthropic or Meta.

