This results in faster code generation with performance that rivals the top open-source models. Here’s how the system works.
The nerdy bits.
Here’s a list of concepts (oversimplified, for the sake of efficiency) that we need to understand before moving on.
Autoregression
Most LLMs are autoregressive. When you ask them a question, they will process the question, predict the token of the answer and then reprocess it with the token. This allows them to generate text the way most of us do: from left to right, from top to bottom.
Temperature
LLMs can be set to a temperature setting that controls the randomness of the output. The model assigns probabilities for all options when predicting the next token. A lower temperature increases the likelihood of choosing the most likely token, while a high temperature allows it to choose less likely ones.
Diffusion
Diffusion models are an alternative to autoregressive model, and they have been used more often by image models such as Stable Diffusion. The model starts with a noisy, fuzzy image and removes noise iteratively while keeping in mind the user’s request. It then steers it towards a more and more similar image.
Are you still with us? Great!
Recently, some large language model have looked at the diffusion architecture for text generation, and the results are pretty promising. Here’s an excellent explanation:
Why do I tell you all this? You can now see why diffusion-based models of text can be faster than autoregressive models, as they can iteratively refine all the text in parallel. This behavior is particularly useful in programming, where global structure is more important than linear token predictions.
Phew! We made it. So Apple released a new model?
Yes. They released a model called DiffuCode-7B cpGRPO (19459060) is a paper that builds on a paper called DiffuCoder : Understanding and Improving Masked Diffusion Models was released last month.
In the paper, a model is described that uses a diffusion-first method to generate codes, but with an interesting twist:
When the sampling temperature increases from the default 0.2 up to 1.2, DiffuCoder can be more flexible when it comes to token generation, and no longer has to follow strict left-to right constraints.
By adjusting the temperature, DiffuCoder can also behave more (or less like) an autoregress In essence, higher temperatures allow it to generate tokens in any order, while lower temperature keep it closer to strict left-to right decoding.
With an extra training step, coupled-GRPO (coupled-GRPO), it learned to produce higher-quality codes with fewer passes. The result? The result?
Built on an open-source LLM from Alibaba
What’s more interesting is that Apple’s foundation model, Qwen2.5-7B was built on an open-source model, Qwen2.5-7B. Apple then took the model and made its adjustments. Alibaba had fine-tuned it for better code generation, as Qwen2.5 Coder-7B.
Then they turned it into a model with a diffusion based decoder as described in the DiffuCoder papers, and adjusted it again to follow instructions. After that, they trained another version using more than 20,000 carefully selected coding examples.
There is certainly room for improvement. DiffuCoder performed better than diffusion-based coding (and this was before the 4.4% boost from DiffuCoder-7B cpGRPO), but it still didn’t reach the level GPT-4 or Gemini Diffusion.
If (or whether?) When?) It is yet to be seen whether these features will translate into actual products and features for users and developers. AirPods deals at Amazon

