Anthropic overtakes OpenAI Claude Opus 4, codes non-stop for seven hours, sets a record SWE Bench score and reshapes the enterprise AI

May 22, 2025, 9:45 AM (19659002)Credit: VentureBeat created with Midjourney

Join our daily and weekday newsletters to receive the latest updates on AI coverage. Learn More

Anthropic is released Claude Work 4, 19459084″”https://www.anthropic.com/claude””> Claude Sonnet 4, today, raising the bar dramatically for what AI can achieve without human intervention.

This is the company’s flagship Opus 4 model maintained focus on an open-source refactoring complex project for almost seven hours during testing. Rakuten – a breakthrough that transforms AI into a real collaborator capable of tackling projects lasting a day.

The marathon performance represents a quantum leap over the previous AI models’ attention spans, which were measured in minutes. The implications of this are huge: AI systems now have the ability to handle complex software engineering tasks from conception to completion while maintaining context and focus for an entire workday.

Anthropic claim Claude Work 4; achieved a score of 72.5% on SWE-bench (19459084) is a rigorous benchmark for software engineering that outperforms OpenAI’s GPT-4.1 scored 54.6% at its launch in April. Anthropic is now a formidable competitor in the increasingly crowded AI market.

Comparative benchmarks show Claude 4 models (left) outperforming competitors across coding and reasoning tasks, with Claude Opus 4 achieving a 72.5% score on the critical SWE-bench test. (Credit: Anthropic)

AI has shifted dramatically to reasoning models by 2025. These systems think through problems before responding, emulating human-like thinking processes rather than simply matching patterns against training data. OpenAI began this shift with its Google’s “o” serieswas followed by the “o” series in December. Gemini 2.5 Proand its experimental ” Deep Think ” capability. DeepSeek’s The R1 model captured unexpected market share due to its exceptional problem-solving abilities at a competitive pricing point.

The pivot signals a fundamental change in the way people use AI. According to Poe Spring 2025 AI Model Usage Trends reportshows that reasoning model usage has increased fivefold in only four months. It went from 2% of all AI interactions to 10%. Users increasingly see AI as a partner in solving complex problems, rather than just a simple question answering system.

The share of reasoning messages surged in early 2025 as new AI models captured user interest. (Credit: Poe)

Claude’s new models distinguish themselves by integrating Tool use is directly integrated into their reasoning process. This simultaneous research and reasoning approach more closely resembles human cognition than previous systems which gathered information before starting analysis. The ability to pause and seek data and incorporate new findings while reasoning creates a more effective and natural problem-solving process.

Dual-mode architecture balances depth with speed

Anthropic’s dual-mode architecture addresses a persistent friction in AI user experience. hybrid approach . Both Claude 4 models provide near-instant answers for simple queries and extended reasoning for complex problems, eliminating the frustrating delays that older reasoning models imposed even on simple questions.

The dual-mode functionality maintains the snappy interaction users expect, while unlocking deeper analytic capabilities when needed. The system dynamically allocates resources to thinking based on the complexity and difficulty of the task. This strikes a balance that previous reasoning models failed at.

Memory persistence is another breakthrough. Claude 4 models are able to extract key information from documents and create summary files. They can also maintain this knowledge throughout sessions if given the appropriate permissions. This capability solves “the amnesia issue” that has limited AI’s usefulness for long-running projects, where context must be preserved over days or even weeks.

This technical implementation works similar to how humans develop knowledge management systems. The AI automatically organizes information into structured formats that are optimized for future retrieval. This approach allows Claude to develop a more refined understanding of complex domains through extended interaction periods.

Anthropic’s announcement underscores the increasing pace of competition within advanced AI. Five weeks after OpenAI’s launch of its Anthropic has developed models that are better or equal to the GPT-4.1 family (19459084) in key metrics. Google updated its Meta released its Meta 2.5 lineuplast month. Llama models with multimodal capabilities, and a 10-million-token context window.

Each lab has developed distinct strengths in this increasingly specialized market. OpenAI leads in General reasoning Google excels at tool integration Anthropic is now the leader in professional coding and multimodal understanding.

Enterprise customers will be impacted by the strategic implications of this. The decision-making process for organizations is becoming more complex, as they must choose which AI systems are best suited to specific use cases. This fragmentation is a boon for sophisticated customers who can take advantage of specialized AI strengths, but a challenge for companies looking for simple, unified solutions.

Anthropic’s Claude integration into development workflows has been expanded with the release of Claude Code . The system now supports background task via GitHub Actions integrates natively into VS Code JetBrainsenvironments display proposed code edits in developers’ files.

GitHub’s decision to use Claude Sonnet 4 for a new coding assistant in GitHub Copilotis a significant market validation. This partnership with Microsoft’s development platform shows that large technology companies are diversifying AI partnerships, rather than relying on a single provider.

Anthropic’s model releases have been enhanced with new API capabilities: a code-execution tool, an MCP connector, a Files API and prompt caching up to one hour. These features allow for the creation of AI agents that are more sophisticated and can persist through complex workflows – essential for enterprise adoption.

Anthropic’s April research report, “Transparency Challenges as Models Grow More Sophisticated” (19659027) “Reasoning models don’t say what they think ” revealed concerning patterns in the way these systems communicate their thoughts. Their study found Claude 3.7 Sonnet (19459084) only used crucial hints to solve problems 25% of the time, raising serious questions about AI reasoning.

The research highlights a growing problem: as models get more capable, they become more opaque. The seven-hour autonomous coding demonstration that showcases Claude Opus’s endurance also demonstrates just how difficult it is for humans to audit such extended reasoning chain.

Industry faces a paradox, wherein increasing capability leads to decreasing transparency. To resolve this tension, new approaches to AI supervision will be needed that balance performance and explainability – a challenge Anthropic has acknowledged but is not yet fully solved.

A future of sustained AI cooperation is taking shape

Claude Opus 4’s seven-hour self-directed work session provides a glimpse into AI’s role in the future of knowledge work. As models improve their memory and focus, they become more like collaborators, capable of complex, sustained work with minimal supervision.

The progression of this development points to a fundamental shift in the way o rganizations structure knowledge work. AI systems can now delegate tasks that used to require constant human attention. They maintain context and focus over hours, or even days. The economic and organizational impact will be significant, especially in domains such as software development where labor costs are high and talent shortages persist.

As Claude 4, which blurs the lines between human and machine intellect, creates a new reality at work, we are faced with a new challenge. Our challenge is not to ask if AI can match the human skills of our employees, but to adapt to a world where our most productive colleagues may be digital instead of human.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop about what companies are doing to leverage generative AI. From regulatory shifts and practical deployments, we give you the information you need to maximize your ROI.

Read our Privacy Policy.

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

Anthropic overtakes OpenAI Claude Opus 4, codes non-stop for seven hours, sets a record SWE Bench score and reshapes the enterprise AI

Dual-mode architecture balances depth with speed

A future of sustained AI cooperation is taking shape

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per...

OpenAI boasts enterprise win days after internal ‘code red’ on Google...

Recomended

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat

Minisforum AI X1 Pro Review: A mini PC that delivers maximum performance

The accelerator is on the ground for autonomous vehicles