OpenAI’s o4 mini reasoning model can now be fine-tuned by your company with reinforcement learning

(

).

Credit : VentureBeat made using Midjourney

Join our daily and weekday newsletters to receive the latest updates on AI coverage. Learn More


Openai announced today on its Third-party software developers can now access RFT for the new o4 mini language reasoning model through a developer-focused account in the social network X (19459057). This allows them to customize it based on the unique products, internal terms, goals, employees and processes of their enterprise.

This capability allows developers to take the model that is available to the public and tweak it using OpenAI’s platform dashboard .

They can then deploy it using OpenAI’s API, another part of its platform for developers, and connect it with their internal employee computers and databases.

If an employee or leader in the company wants to use this through a custom internal bot or Openai GPTcan be customized to retrieve private, proprietary knowledge of a company, answer specific questions regarding company products and policies or generate new communications in the company voice. This is easier with the RFT version.

One cautionary note, however: research has shown fine-tuned model may be more susceptible to jailbreaks and apparitions, so proceed with caution! This launch expands OpenAI’s model optimization beyond supervised fine tuning (SFT) to include more flexible control over complex, domain-specific tasks.

OpenAI also announced that its GPT-4.1 Nano model, which is the company’s fastest and most affordable offering to date, now supports supervised fine tuning.

How can Reinforcement-Fine-Tuning (RFT), help organizations and businesses?

RFT creates a new version of OpenAI’s o4-mini reasoning model that is automatically adapted to the user’s or their enterprise/organization’s goals.

This is done by implementing a feedback loop in the training process, which developers from large enterprises (or independent developers working independently) are now able to initiate relatively easily, quickly and affordably. OpenAI’s developer platform online

RFT uses a scorer model instead of training on a series of questions with fixed answers, which is what traditional supervised learning does.

After training, the algorithm adjusts model weights in order to increase the likelihood of high-scoring outcomes. This structure allows customers align models to nuanced goals such as an enterprise’s “house style” of communication and terminology. Safety rules, factual accuracy or internal policy compliance can also be achieved.

To do RFT, users must:

  1. Create a grading function using OpenAI model-based graders. Upload a dataset that includes prompts and validation splits.
  2. Configure an training job via API, or the fine-tuning Dashboard.
  3. Monitor the progress, review checkpoints, and iterate data or grading logical.

RFT supports only the o-series models of reasoning and is available for o4-mini.

Early enterprise use cases

on its platform OpenAI highlighted a number of early customersthat have adopted RFT in diverse industries.

  • AI Agreement fine-tuned a model for tax analysis tasks using RFT, achieving a 39 percent improvement in accuracy. They also outperformed all leading models when it came to tax reasoning benchmarks.
  • Ambience Healthcare (19459072) applied RFT to ICD-10 code assignment, boosting model performance by 12 percentage points over physician baselines in a gold panel dataset. Harvey applied RFT to legal document analysis. They improved citation extraction F1 score by 20%, and matched GPT-4o accuracy while achieving faster interpretation. Runloop improved models for generating Stripe API codes using syntax-aware graders. They also used AST validation logic to achieve a 12% increase. Milo used RFT for scheduling tasks and increased correctness by 25 points in situations of high complexity.
  • Safykit used RFT to enforce nuanced content moderation policies and increased model F1 from 86% to 90% in production.
  • ChipStack , Thomson Reutersand other partners also demonstrated performance gains in structured data generation, legal comparison tasks and verification workflows.

These cases often shared characteristics: clear task definitions, structured output formats and reliable evaluation criteria–all essential for effective reinforcement fine-tuning.

RFT is available now to verified organizations. To help improve future models, OpenAI offers teams that share their training datasets with OpenAI a 50% discount. Interested developers can get started using OpenAI’s RFT documentation and dashboard .

Pricing and billing structure

Unlike supervised or preference fine-tuning, which is billed per token, RFT is billed based on time spent actively training. Specifically:

  • $100 per hour of core training time (wall-clock time during model rollouts, grading, updates and validation).
  • Time is prorated by the second, rounded to two decimal places (so 1.8 hours of training would cost the customer $180).
  • Charges apply only to work that modifies the model. Queues, safety checks, and idle setup phases are not billed.
  • If the user employs OpenAI models as graders (e.g., GPT-4.1), the inference tokens consumed during grading are billed separately at OpenAI’s standard API rates. Otherwise, the company can use outside models, including open source ones, as graders.

Here is an example cost breakdown:

Scenario Billable Time Cost
4 hours training 4 hours $400
1.75 hours (prorated) 1.75 hours $175
2 hours training + 1 hour lost (due to failure) 2 hours $200

This pricing model provides transparency and rewards efficient job design. To control costs, OpenAI encourages teams to:

  • Use lightweight or efficient graders where possible.
  • Avoid overly frequent validation unless necessary.
  • Start with smaller datasets or shorter runs to calibrate expectations.
  • Monitor training with API or dashboard tools and pause as needed.

OpenAI uses a billing method called “captured forward progress,” meaning users are only billed for model training steps that were successfully completed and retained.

So should your organization invest in RFTing a custom version of OpenAI’s o4-mini or not?

Reinforcement fine-tuning introduces a more expressive and controllable method for adapting language models to real-world use cases.

With support for structured outputs, code-based and model-based graders, and full API control, RFT enables a new level of customization in model deployment. OpenAI’s rollout emphasizes thoughtful task design and robust evaluation as keys to success.

Developers interested in exploring this method can access documentation and examples via OpenAI’s fine-tuning dashboard.

For organizations with clearly defined problems and verifiable answers, RFT offers a compelling way to align models with operational or compliance goals — without building RL infrastructure from scratch.

Daily insights on business use cases with VB Daily

If you want to impress your boss, VB Daily has you covered. We give you the inside scoop on what companies are doing with generative AI, from regulatory shifts to practical deployments, so you can share insights for maximum ROI.

Read our Privacy Policy

Thanks for subscribing. Check out more VB newsletters here.

An error occured.

www.aiobserver.co

More from this stream

Recomended