OpenAGI Foundation Launches Lux: A Foundation Computer Use Model that Tops Online Mind2Web with OSGym At Scale

How can repetitive, manual clicking tasks across various browsers and desktop environments be transformed into a scalable, dependable automated system that effectively operates a computer on your behalf? Enter Lux, the newest advancement in computer use agents transitioning from experimental prototypes to robust infrastructure. Developed by the OpenAGI Foundation team, Lux is a foundational model designed to control real desktops and browsers, achieving an impressive score of 83.6 on the Online Mind2Web benchmark. This benchmark evaluates over 300 authentic computer interaction tasks, placing Lux well ahead of competitors like Google Gemini CUA (69.0), OpenAI Operator (61.3), and Anthropic Claude Sonnet 4 (61.0).

Lux computer use model interface

Understanding Lux: Beyond a Simple Chatbot

Unlike typical chat models equipped with browser plugins, Lux functions as a comprehensive computer use model. It interprets natural language commands, visually analyzes the screen, and executes granular actions such as mouse clicks, keyboard inputs, and scrolling. This capability enables Lux to interact seamlessly with browsers, text editors, spreadsheets, email clients, and a wide range of desktop applications by operating directly on the rendered user interface rather than relying on application-specific APIs.

For developers, Lux is accessible via the OpenAGI SDK and API console. The model is tailored for diverse workloads including software quality assurance processes, extensive research tasks, social media management, e-commerce operations, and large-scale data entry. In these scenarios, Lux must orchestrate sequences of dozens or even hundreds of UI interactions while maintaining alignment with the original natural language instructions.

Lux executing multi-step tasks

Three Distinct Operation Modes Tailored to Your Needs

Lux offers three operational modes that balance speed, autonomy, and precision control, catering to different task complexities:

  • Actor Mode: Designed for rapid execution, this mode completes each step in approximately one second. It excels at well-defined tasks such as completing forms, generating reports from dashboards, or extracting specific data fields. Think of it as a high-speed macro engine with natural language understanding.
  • Thinker Mode: Suited for ambiguous or multi-step objectives, this mode breaks down broad instructions into manageable subtasks before executing them. Use cases include navigating multi-page research, managing extensive email triage, or exploring analytics dashboards where the exact navigation path isn’t predetermined.
  • Tasker Mode: Prioritizing determinism, this mode executes a precise Python-defined sequence of steps, retrying until successful completion or encountering a critical failure. This approach allows teams to maintain control over task flows, error handling, and safeguards within their own codebase while delegating UI interactions to Lux.

These modes-Tasker, Actor, and Thinker-offer flexible solutions for procedural workflows, rapid task execution, and complex problem-solving respectively.

Performance Metrics, Efficiency, and Cost Considerations

Lux’s performance on the Online Mind2Web benchmark is notable, achieving an 83.6% success rate across more than 300 real-world web tasks. This outperforms Google Gemini CUA (69.0%), OpenAI Operator (61.3%), and Anthropic Claude Sonnet 4 (61.0%). Such a benchmark serves as a practical indicator of an agent’s ability to navigate and manipulate browsers and web applications effectively.

From an engineering perspective, latency and operational cost are critical. Lux completes individual steps in roughly one second, significantly faster than OpenAI Operator’s three seconds per step under similar conditions. Additionally, Lux is approximately ten times more cost-effective per token than OpenAI Operator. For agents executing hundreds of actions per session, these efficiency gains are crucial for sustainable production deployment.

Innovative Training Approach: Agentic Active Pre-training and the Role of OSGym

Lux is developed using a novel training methodology termed Agentic Active Pre-training. Unlike conventional language models that passively absorb vast amounts of internet text, Lux learns by actively interacting with digital environments, refining its behavior through extensive real-time engagement. This approach diverges from traditional reinforcement learning by emphasizing autonomous exploration and comprehension rather than relying on manually crafted reward signals.

Central to this training paradigm is OSGym, an open-source data engine capable of running over 1,000 full operating system replicas simultaneously. Licensed under MIT for both research and commercial use, OSGym supports complex tasks spanning office suites, browsers, development tools, and multi-application workflows. It can generate upwards of 1,400 multi-turn interaction trajectories per minute at minimal cost per replica, providing a scalable platform for training and evaluating computer use agents like Lux.

Summary of Key Insights

  1. Lux is a foundational computer use model capable of controlling full desktops and browsers, achieving an 83.6% success rate on the Online Mind2Web benchmark, surpassing leading competitors.
  2. It offers three operational modes-Actor, Thinker, and Tasker-addressing needs from rapid UI macros to complex multi-step task execution and deterministic scripted workflows.
  3. Lux operates with low latency (~1 second per step) and is significantly more cost-efficient (about 10x cheaper per token) compared to alternatives, making it ideal for long-duration automated tasks.
  4. The model is trained through Agentic Active Pre-training, learning by interacting with environments rather than solely processing static text, enhancing its ability to translate screen observations into precise actions.
  5. OSGym, the open-source engine behind Lux, enables large-scale parallel training with full OS replicas, facilitating practical development and benchmarking of advanced computer use agents.

More from this stream

Recomended