OpenAI has just released its first open-source model, which is essentially Phi-5

August 8, 2025

^{There are two large language models} called gpt oss 120b and gpt oss 20b. You can speak to them here. Are they good models or not? It depends on what you are looking for. They are great at some benchmarks,(OpenAI wouldn’t have released them otherwise), but they are terrible at others. Some people like them

Some people on Twitter really don’tI can tell that they are technically competent, but lack a great deal of knowledge outside their domain. For example, they have a broad knowledge about science but not much about popular culture. In six months, we’ll be able to tell how useful these models really are. But I predict that they will fall into the category of models that perform better on benchmarks rather than real-world tasks. Sebastien Bubeck, Microsoft’s open source Phi-series model developer in 2024, led the development of Microsoft’s open-source Phi series of models2.The idea behind these models was to train exclusively with synthetic data, instead of text from books or the Internet. Synthetic data is rarer than normal data because you can’t download terabytes for free. Instead, you have to pay money to create each token. The trade-off, however, is that you can have complete control over the training data. What happens when a model is trained on high-quality synthetic data and curated data only?

It turns out that it performs very well on benchmarks for models but fails in practice. The same pattern is evident when searching for feedback on each Phi model: impressive benchmarks lots of enthusiasm and then actual performance that is far less than the benchmarks would suggest.

The impressive benchmark results are due to the fact that you can easily train these models for specific tasks because you generate most of the training data yourself. You’d be foolish to not create some synthetic data that matches benchmarking problems if you’re training with synthetic data. Since you’re “teaching to the test”expect to perform worse than other models who train on broad data. Why am I referring to Phi models

? Sebastien Bubeck will join OpenAI at the end of 2024. We don’t know how the new OpenAI gpt-oss model was created. The model card does not provide much information about the pretraining phase. I’d bet on Sebastien Bubeck being involved in the effort and that these models were based on a heavily filtered or synthetic dataset.

Using synthetic data is safer.

Would OpenAI train Phi models, knowing they would perform better in benchmarks than real-world applications. Microsoft continued to train Phi models for the same reason. It’s terrifying to release an open-source version of a model for a large company. Once your name is attached to it, thousands of researchers will try to fine-tune the model to remove safety guardrails.

Although it’s not often discussed in public, the main use for fine-tuning language models for small languages is erotic role play, and there’s an important demand. Every small online community of people who run local model is at least half perverts.

People can’t fine tune a closed-weights model if you release it. You can always update your model if you make a mistake. Open-source models will always be available.

Training with synthetic data (or highly controlled data such as textbooks), makes it easier to produce a reliable model. You can create as much content as you want that says “you asked me X, but I’m a sensible language-model and I’m declining to do it” If there is no subversive content in the data, then the model will never learn to behave in a subversive way (at least that’s what we want).

It must have been a very compelling decision for OpenAI to train a Phi style model for their open source release. They needed a model to beat the Chinese open source models on benchmarks while not misbehaving and causing another scandal for them. They don’t care if their open-source model is really great, as their main business lies in their closed-source versions.

I think OpenAI chose the synthetic data approach for their gpt-oss new models. They could just as easily be called Phi-5 and the Phi-5-mini, for good or bad.

Please subscribe for email updates on my new posts or share it on Hacker News if you liked this article. Tags: ai