News

Thinking Machines Launches Tinker: A Low-Level Training API that Abstracts Distributed LLM Fine-Tuning without Hiding the Knobs

October 3, 2025

Thinking Machines has introduced Tinker, a Python-based API designed to empower researchers and developers to craft training loops locally while seamlessly delegating execution to managed distributed GPU clusters. This tool targets a specialized audience, offering granular control over data management, objective functions, and optimization procedures, while automating complex tasks such as scheduling, fault tolerance, and multi-node coordination. Currently available in private beta with a waitlist, Tinker is free to start and will transition to a usage-based pricing model in the near future.

What Exactly Is Tinker?

Unlike high-level training abstractions, Tinker provides fundamental building blocks for model training rather than simplified “train()” functions. Its core API includes commands like forward_backward, optim_step, save_state, and sample, granting users direct oversight of gradient calculations, optimizer updates, checkpointing, and inference within custom training loops. A typical usage scenario involves initializing a LoRA training client on a base model such as Llama-3.2-1B, iteratively invoking forward_backward and optim_step, saving the model state, and then switching to a sampling client for evaluation or exporting the fine-tuned weights.

Core Advantages and Functionalities

Support for Open-Source Models: Tinker facilitates fine-tuning of popular open-weight models including the Llama series and Qwen family, extending to large-scale mixture-of-experts architectures like Qwen3-235B-A22B.
LoRA-Centric Fine-Tuning: The platform leverages Low-Rank Adaptation (LoRA) instead of full model fine-tuning. According to their technical insights, LoRA can achieve comparable performance to full fine-tuning in many practical scenarios, particularly in reinforcement learning contexts, when properly configured.
Interoperable Outputs: Users can download trained adapter weights for deployment outside the Tinker environment, enabling integration with preferred inference frameworks or service providers.

Supported Models and Infrastructure

Tinker is positioned as a managed post-training solution tailored for open-weight models ranging from compact LLMs to expansive mixture-of-experts systems, exemplified by support for models like Qwen-235B-A22B. Model switching is streamlined-simply update the model identifier string and rerun the training loop. Behind the scenes, workloads are orchestrated on Thinking Machines’ proprietary GPU clusters. The LoRA methodology allows for efficient resource sharing and reduces overhead, optimizing cluster utilization.

Tinker Cookbook: Ready-Made Training and Fine-Tuning Recipes

To minimize repetitive coding while maintaining a minimalist core API, the team has released an open-source cookbook under the Apache-2.0 license. This repository offers pre-built reference loops for supervised learning and reinforcement learning, along with comprehensive examples covering RLHF workflows (including three-stage supervised fine-tuning, reward modeling, and policy reinforcement learning), mathematical reasoning reward functions, tool-use and retrieval-augmented tasks, prompt distillation, and multi-agent environments. Additional utilities include LoRA hyperparameter calculators and integrations with evaluation tools such as InspectAI.

Current Adopters and Access Details

Early adopters of Tinker include research teams at prestigious institutions such as Princeton (working on Gödel provers), Stanford (chemistry-focused projects led by Rotskoff), UC Berkeley (SkyRL project involving asynchronous off-policy multi-agent and tool-use reinforcement learning), and Redwood Research (applying RL techniques on Qwen3-32B for control tasks).

At present, Tinker remains in private beta with a waitlist for new users. The platform is free to use initially, with plans to introduce a usage-based pricing scheme soon. Interested organizations are encouraged to reach out directly for onboarding support.

Insights and Evaluation

Tinker’s approach of exposing granular primitives like forward_backward, optim_step, save_state, and sample rather than a monolithic train() function is commendable. This design preserves user control over objective formulation, reward shaping, and evaluation strategies, while offloading the complexities of distributed execution to managed infrastructure. The emphasis on LoRA fine-tuning is a practical choice, balancing cost efficiency and turnaround time. Their internal research suggests that, with proper configuration, LoRA can rival full fine-tuning performance, especially in reinforcement learning tasks. However, for rigorous experimentation, features like transparent logging, deterministic random seeds, and detailed per-step telemetry would be essential to ensure reproducibility and monitor training drift.

The Cookbook’s reference implementations for RLHF and supervised learning provide valuable starting points, but the platform’s ultimate value will depend on its stability under heavy workloads, checkpoint portability, and robust data governance mechanisms, including privacy safeguards and audit trails.

Overall, Tinker’s open and flexible API offers a compelling alternative to closed, black-box training systems. By enabling explicit control over training loops for open-weight LLMs while managing distributed execution behind the scenes, it lowers barriers for researchers and practitioners to innovate and iterate rapidly on custom fine-tuning workflows.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

What Exactly Is Tinker?

Core Advantages and Functionalities

Supported Models and Infrastructure

Tinker Cookbook: Ready-Made Training and Fine-Tuning Recipes

Current Adopters and Access Details

Insights and Evaluation

RELATED ARTICLES

The AI lab revolving door spins ever faster

A Coding Guide to Build a Procedural Memory Agent That Learns,...

Mistral AI Ships Devstral 2 Coding Models And Mistral Vibe CLI...