Home Industries Education Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement...

Anyscale and NovaSky Team Releases SkyRL tx v0.1.0: Bringing Tinker Compatible Reinforcement Learning RL Engine To Local GPU Clusters

0

How can AI development teams implement Tinker-style reinforcement learning on large language models (LLMs) using their own infrastructure through a single, unified platform? The collaboration between Anyscale and NovaSky (UC Berkeley) introduces SkyRL tx, a solution that empowers developers to deploy a Tinker-compatible training and inference engine locally, while maintaining the streamlined API familiar from the managed Tinker service.

Understanding the Tinker API

Tinker, developed by Thinking Machines, offers a minimalist yet powerful training API centered on four fundamental operations: forward_backward executes both forward and backward passes to accumulate gradients; optim_step updates model parameters using these gradients; sample generates tokens for interaction, evaluation, or reinforcement learning actions; and save_state creates checkpoints to enable training resumption.

Rather than providing a high-level fine-tuning abstraction, Tinker exposes these low-level primitives, allowing users to craft custom supervised or reinforcement learning loops in standard Python. Meanwhile, the service manages GPU scheduling and distributed execution seamlessly.

Introducing SkyRL tx: Bringing Tinker to Your Infrastructure

SkyRL tx is designed to replicate the Tinker API experience on local hardware, offering an open-source backend that eliminates dependency on hosted environments. This initial v0.1.0 release delivers comprehensive reinforcement learning support and significantly accelerates sampling performance, making it a robust tool for teams seeking full control over their RL workflows.

Positioning SkyRL tx Within the SkyRL Ecosystem

SkyRL is a comprehensive reinforcement learning framework tailored for large language models, comprising components such as skyrl-agent for managing long-horizon agents, skyrl-train for training orchestration, and skyrl-gym which provides environments for tasks like mathematics, coding, search, and SQL query execution.

Within this suite, skyrl-tx serves as an experimental, cross-platform library that exposes a local REST API mimicking Tinker’s interface. It acts as the critical system layer that bridges reinforcement learning algorithms, environment interactions, and training logic with physical GPU resources.

Architecture Overview: A Dual-Purpose Inference and Training Engine

SkyRL tx’s architecture is engineered as an inference engine capable of performing backward passes for training. It consists of four key components:

  1. REST API: Handles incoming requests from multiple users, providing a unified interface.
  2. Database: Maintains metadata on models, checkpoints, requests, and job queues. The current implementation uses SQLite but supports other SQL databases like PostgreSQL.
  3. Engine: Manages scheduling and batching of requests across users. Each engine instance serves a single base model and supports multiple LoRA (Low-Rank Adaptation) adapters.
  4. Worker: Executes forward and backward passes, holding model definitions and optimizer states. Future versions aim to support multi-node sharding through multiple workers.

What’s New in Version 0.1.0?

The v0.1.0 update focuses on enhancing reinforcement learning capabilities and boosting performance. Key improvements include:

  • Accelerated sampling through just-in-time (JIT) compilation, efficient batching, and sharding within the engine.
  • Support for customizable sampling parameters per request, including unique seeds and stop tokens, facilitating diverse experiments on a shared base model.
  • Stabilized reinforcement learning loops that now operate reliably through the engine.
  • Implementation of gradient checkpointing and micro-batching techniques to optimize memory usage and throughput during sampling.
  • Expanded database backend support with PostgreSQL alongside SQLite.

Executing End-to-End Reinforcement Learning on an 8-GPU H100 Cluster

The official release includes a detailed example demonstrating how to run reinforcement learning workflows on a cluster equipped with eight NVIDIA H100 GPUs.

To get started, users clone the SkyRL repository and launch the engine within the skyrl-tx directory using the following command:

uv run --extra gpu --extra tinker -m tx.tinker.api 
  --base-model Qwen/Qwen3-4B 
  --max-lora-adapters 3 
  --max-lora-rank 1 
  --tensor-parallel-size 8 
  --train-micro-batch-size 8 > out.log

Next, by cloning the Tinker Cookbook from the Thinking Machines team and navigating to the tinker_cookbook/recipes folder, users can initiate the reinforcement learning loop with:

export TINKER_API_KEY=dummy
export WANDB_API_KEY=<your_key>
uv run --with wandb --with tinker rl_loop.py 
  base_url=http://localhost:8000 
  model_name="Qwen/Qwen3-4B" 
  lora_rank=1 
  max_length=1024 
  save_every=100

This process generates a reward curve, validating that the reinforcement learning loop functions correctly through the local SkyRL tx backend.

Summary of Key Insights

  • SkyRL tx v0.1.0 delivers a local, Tinker-compatible engine that unifies training and inference for post-training large language models.
  • The platform exposes core Tinker primitives-forward_backward, optim_step, sample, and save_state-via a RESTful API, while internally managing batching, LoRA adapters, and device allocation.
  • Its modular architecture separates concerns into an API server, SQL database, scheduling engine, and worker processes, each dedicated to handling specific aspects of model training and inference.
  • Version 0.1.0 introduces full reinforcement learning support, faster JIT-compiled and sharded sampling, per-request sampling customization, gradient checkpointing, micro-batching, and PostgreSQL compatibility.

Final Thoughts

SkyRL tx v0.1.0 represents a significant advancement for development teams aiming to implement Tinker-style reinforcement learning on their own hardware clusters while preserving a consistent API experience. By conceptualizing the system as an inference engine capable of backward passes, it simplifies the software stack and reduces complexity. The addition of LoRA support, gradient checkpointing, micro-batching, and PostgreSQL integration marks a substantial upgrade in system capabilities. Overall, this release transforms Tinker compatibility into a practical, deployable local reinforcement learning backend for large language models.

Exit mobile version