xAI has unveiled Grok-4-Fast, an optimized and cost-effective evolution of Grok-4 that integrates both “reasoning” and “non-reasoning” functionalities within a single model architecture. This innovation allows dynamic control over behavior through system prompts, supporting applications such as high-volume search, coding assistance, and question answering. Notably, Grok-4-Fast boasts an expansive 2 million token context window and is trained with native reinforcement learning (RL) for tool usage, enabling it to autonomously decide when to browse the internet, run code, or invoke external tools.
Innovative Unified Architecture
Earlier Grok models separated long-form reasoning tasks from brief, non-reasoning outputs by deploying distinct models. Grok-4-Fast breaks this mold by employing a single unified weight space that seamlessly handles both response types. This design significantly reduces latency and token consumption by steering the model’s behavior through system prompts rather than switching between different models. Such efficiency is crucial for real-time environments like interactive coding platforms, search engines, and AI assistants, where delays and costs from model switching can be prohibitive.
Enhanced Search and Autonomous Agent Capabilities
Trained end-to-end with tool-use reinforcement learning, Grok-4-Fast demonstrates superior performance on search-oriented agent benchmarks. It achieves impressive scores such as 44.9% on BrowseComp, 95.0% on SimpleQA, and 66.0% on Reka Research, with even stronger results on Chinese-language benchmarks like BrowseComp-zh at 51.2%. In private evaluations on LMArena, the search-optimized variant (codenamed “menlo”) secured the top spot in the Search Arena with an Elo rating of 1163, while the text-focused variant (“tahoe”) ranked eighth in the Text Arena, closely matching the performance of Grok-4-0709.
Superior Efficiency and Performance Metrics
Across both internal and public testing suites, Grok-4-Fast delivers state-of-the-art results while significantly reducing token usage. Reported pass@1 accuracy rates include 92.0% on AIME 2025 (without tools), 93.3% on HMMT 2025 (no tools), 85.7% on GPQA Diamond, and 80.0% on LiveCodeBench (January to May). These figures approach or match Grok-4’s performance but with approximately 40% fewer “thinking” tokens consumed on average. xAI describes this as an increase in “intelligence density,” translating to an estimated 98% cost reduction to achieve equivalent benchmark outcomes when factoring in the lower token count and updated per-token pricing.
Availability and Pricing Structure
Grok-4-Fast is now broadly accessible to all users through Grok’s Fast and Auto modes on both web and mobile platforms. The Auto mode intelligently selects Grok-4-Fast for complex queries, optimizing response speed without compromising quality. For the first time, free-tier users gain access to xAI’s latest model generation. Developers can choose between two SKUs-grok-4-fast-reasoning and grok-4-fast-non-reasoning-each supporting the extensive 2 million token context window. Pricing for the xAI API is tiered as follows: $0.20 per million input tokens for contexts under 128K tokens, $0.40 per million input tokens for contexts above 128K, $0.50 per million output tokens under 128K, $1.00 per million output tokens over 128K, and a discounted $0.05 per million cached input tokens.
Key Technical Highlights
- Single Model with Massive Context: Grok-4-Fast merges reasoning and non-reasoning tasks into one model with a 2 million token context window, controlled via prompt steering.
- Cost-Effective Pricing: Competitive API rates starting at $0.20 per million input tokens and $0.50 per million output tokens, with discounts for cached inputs and higher rates only applying beyond 128K tokens.
- Efficiency Gains: Approximately 40% reduction in “thinking” token consumption compared to Grok-4, resulting in nearly 98% lower costs to match Grok-4’s benchmark performance.
- Robust Benchmark Performance: High pass@1 scores including 92.0% on AIME-2025, 93.3% on HMMT-2025, 85.7% on GPQA-Diamond, and 80.0% on LiveCodeBench.
- Optimized for Agentic and Search Tasks: Trained with tool-use RL, Grok-4-Fast excels in browsing and search workflows, supported by documented search-agent metrics and live billing integration.
Conclusion
Grok-4-Fast represents a significant leap forward by consolidating Grok-4’s capabilities into a single, prompt-driven model with an unprecedented 2 million token context window and integrated tool-use reinforcement learning. Its design prioritizes efficiency, reducing latency and operational costs while maintaining top-tier accuracy. Early public benchmarks and competitive rankings in search and text arenas validate xAI’s claims of delivering Grok-4-level performance with substantially fewer computational resources, making it a compelling choice for high-throughput search and AI agent applications.

