xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

xAI has unveiled Grok-4-Fast, an advanced, cost-efficient evolution of Grok-4 that integrates both “reasoning” and “non-reasoning” functionalities within a single, unified weight framework. This design allows dynamic behavior modulation through system prompts, optimizing performance for high-demand tasks such as search, coding, and question answering. Notably, Grok-4-Fast supports an expansive 2 million token context window and incorporates native reinforcement learning (RL) for tool usage, enabling it to autonomously decide when to browse the internet, run code, or invoke external tools.

Innovative Unified Architecture

Earlier Grok models separated long-form reasoning from brief, non-reasoning responses by deploying distinct models for each. Grok-4-Fast revolutionizes this approach by consolidating these capabilities into a single weight space. This unification significantly reduces latency and token consumption by steering the model’s output through system prompts rather than switching between models. Such an architecture is particularly advantageous for real-time applications like interactive coding assistants, search engines, and conversational agents, where minimizing delay and cost is critical.

Enhanced Search and Autonomous Agent Capabilities

Trained end-to-end with tool-use reinforcement learning, Grok-4-Fast excels in search-oriented agent benchmarks. It achieves impressive results including 44.9% on BrowseComp, 95.0% on SimpleQA, and 66.0% on Reka Research, with even stronger performance on Chinese-language datasets such as BrowseComp-zh at 51.2%. In private evaluations on LMArena, the search-optimized variant (codenamed “menlo”) secured the top spot in the Search Arena with an Elo rating of 1163, while the text-focused variant (“tahoe”) ranked eighth in the Text Arena, closely matching the performance of Grok-4-0709.

Superior Efficiency and Competitive Performance

Across both internal and public benchmarks, Grok-4-Fast delivers state-of-the-art accuracy while significantly reducing token usage. Reported pass@1 scores include 92.0% on AIME 2025 (without tools), 93.3% on HMMT 2025 (without tools), 85.7% on GPQA Diamond, and 80.0% on LiveCodeBench (Jan-May). These results closely rival Grok-4’s performance but with approximately 40% fewer “thinking” tokens consumed on average. xAI describes this as an increase in “intelligence density,” highlighting a nearly 98% cost reduction to achieve equivalent benchmark outcomes when factoring in the lower token count and updated per-token pricing.

Availability and Pricing Structure

Grok-4-Fast is now broadly accessible to all users through Grok’s Fast and Auto modes on both web and mobile platforms. The Auto mode intelligently selects Grok-4-Fast for complex queries, balancing speed and quality. For the first time, free-tier users gain access to xAI’s latest model generation. Developers can choose between two SKUs-grok-4-fast-reasoning and grok-4-fast-non-reasoning-each supporting the extensive 2 million token context. The API pricing is tiered as follows: $0.20 per million input tokens for contexts under 128K tokens, $0.40 per million input tokens for contexts above 128K, $0.50 per million output tokens below 128K, $1.00 per million output tokens beyond 128K, and a discounted $0.05 per million cached input tokens.

Key Technical Highlights

Single unified model with massive context: Grok-4-Fast merges reasoning and non-reasoning tasks into one model, featuring a 2 million token context window across both available SKUs.
Cost-effective pricing model: Competitive API rates start at $0.20 per million input tokens and $0.50 per million output tokens, with further discounts for cached inputs and higher rates only applying beyond 128K tokens.
Efficiency breakthroughs: Approximately 40% reduction in “thinking” token consumption compared to Grok-4, translating to nearly 98% lower costs for matching benchmark performance.
Robust benchmark results: High pass@1 accuracy on leading tests including AIME-2025 (92.0%), HMMT-2025 (93.3%), GPQA-Diamond (85.7%), and LiveCodeBench (80.0%).
Optimized for agentic workflows: Post-training with tool-use RL equips the model for advanced browsing and search tasks, supported by documented search-agent metrics and live billing integration.

Conclusion

Grok-4-Fast represents a significant leap forward by consolidating Grok-4’s capabilities into a single, prompt-driven model with an unprecedented 2 million token context window and integrated tool-use reinforcement learning. Its design prioritizes efficiency and cost-effectiveness, making it ideal for high-throughput search and agent applications. Early performance indicators, including top rankings in competitive arenas and substantial token savings, validate xAI’s claims of maintaining accuracy while reducing latency and operational expenses.

xAI launches Grok-4-Fast: Unified Reasoning and Non-Reasoning Model with 2M-Token Context and Trained End-to-End with Tool-Use Reinforcement Learning (RL)

Innovative Unified Architecture

Enhanced Search and Autonomous Agent Capabilities

Superior Efficiency and Competitive Performance

Availability and Pricing Structure

Key Technical Highlights

Conclusion

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat