Databricks has a trick that lets AI models improve themselves

March 26, 2025

Databricks a company that helps large businesses build custom artificial-intelligence models has developed a machine learning trick that can boost an AI model’s performance without the need for cleanly labeled data. Jonathan Frankle, chief AI Scientist at Databricks spent the last year talking with customers about the main challenges they face when trying to get AI to work reliably. Frankle says that dirty data is the problem.

Frankle says that “everyone has some data and an idea of what they would like to do.” The lack of clean data can make it difficult to fine-tune the model to perform a particular task. “Nobody comes with nice, clean data for fine-tuning that you can stick in a prompt or [application programming interface]” for a model.

Databricks’ model could eventually allow companies to deploy their own agents for performing tasks, without data-quality standing in the way.

This technique provides a rare glimpse at some of the tricks that engineers use to improve the capabilities of advanced AI models when good data is difficult to find. The method combines reinforcement learning, which is a way to improve AI models through practice, and “synthetic” or AI-generated training data.

All the latest models from OpenAI and DeepSeek rely heavily both on synthetic training data as well as reinforcement learning. WIRED reported that Nvidia intends to acquire Gretel, which specializes in synthetic training data. Frankle says, “We’re all navigating in this space.”

Databricks exploits the fact, that given enough tries and a weak model, it can score well in a given task. Researchers call this method “best-of N” to boost a model’s performance. Databricks trained the model to predict what best-of -N results human testers will prefer, based upon examples. Databricks reward models, or DBRMs, can be used to improve other models’ performance without needing additional labeled data.

The DBRM is used to select the most optimal outputs for a model. This creates artificial training data to further fine-tune the model, so that it produces better outputs the first time. Databricks calls this new approach Test-time adaptive optimization or TAO. Frankle says that this method uses a relatively lightweight reinforcement-learning technique to bake the benefits of the best-of-N algorithm into the model.

Frankle adds that Databricks’ research shows that TAO improves with larger, more capable models. Combining synthetic data and reinforcement learning to improve language models is not a new technique, but it is technically challenging.

Databricks has been unusually transparent about how it develops AI because it wants customers to know that it can create powerful custom models. The company revealed to WIRED previously how it developed DBX – a cutting edge open source large language models (LLMs) – from scratch.

Without properly labeled and carefully curated data it is difficult to fine-tune a LLM to perform specific tasks more efficiently, such as analyzing health records or financial reports to find patterns or identify issues. Many companies hope to automate tasks using LLMs with so-called agents.

For example, an agent in finance could analyze the key performance of a company, then generate a document and send it automatically to different analysts. In health insurance, an agent could help direct customers to information about a drug or condition.

Databricks evaluated the TAO approach using FinanceBench, an evaluation tool that measures how well language models can answer financial questions. Llama 3.0B, the smallest Meta AI model, scored 68.4 % on this benchmark compared to OpenAI’s proprietary GPT-4o & o3-mini AI models which scored 82.1 %. Databricks used the TAO technique to get Llama 3.0B to score 82.8 on FinanceBench. This is higher than OpenAI’s models.

Christopher Amato is a computer scientist who works on reinforcement-learning at Northeastern University. “The general concept is very promising,” he says. “I agree that the lack good training data is a major problem.”

Amato said that many companies are searching for ways to teach AI models using synthetic data and reinforcement-learning. He says that the TAO method is “very promising” because it allows for much more scalable data labels and improved performance over the course of time as models and labels improve over time. Amato says that reinforcement learning can behave in unpredictable ways at times, so it must be used with caution.

Frankle states that DataBricks uses the TAO technique in order to boost the performance and build the first agents of its customers’ AI models. One customer, who makes a health tracking app, found that the TAO method allowed it to deploy a model that had not been reliable before. “You want [the app] medically accurate,” says he. “This is a difficult problem.”

{{post_title}}

Databricks has a trick that lets AI models improve themselves

NO COMMENTS

LEAVE A REPLY

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

UK firms struggle to scale AI across their businesses

Oracle Cloud security SNAFU: IT giant accused as evidence disappears

Check Point confirms breach but says it was “old” data and...

NO COMMENTS

LEAVE A REPLY Cancel reply

LEAVE A REPLY