Researchers find that retraining only small parts of AI models can cut costs and prevent forgetting

Organizations frequently encounter a challenge when adapting large language models (LLMs) to specific tasks: enhancing model performance often comes at the cost of losing previously acquired capabilities. This phenomenon, commonly observed after fine-tuning, results in models “forgetting” how to execute certain functions they had mastered earlier.

Understanding and Addressing Knowledge Degradation in LLMs

Recent investigations from the University of Illinois Urbana-Champaign introduce an innovative retraining strategy designed to prevent this so-called “catastrophic forgetting,” where models seemingly lose prior knowledge. Their study centers on two multimodal LLMs-LLaVA and Qwen 2.5-VL-that generate responses based on image inputs.

The proposed technique advocates for selective retraining of specific model components rather than the entire architecture, significantly reducing computational expenses. The researchers argue that what appears as memory loss is actually a shift in output bias caused by changes in task distribution.

Given that training a new large multimodal model can demand millions of dollars, weeks of processing time, and produce substantial carbon emissions, optimizing update methods for existing models is critical. The team emphasizes exploring fine-tuning approaches that maintain learned knowledge while minimizing shifts in model outputs.

Investigating the Roots of Catastrophic Forgetting

To validate the presence and causes of catastrophic forgetting, the researchers designed a series of target tasks for the models to perform. After fine-tuning, they assessed whether the models exhibited significant performance drops. Interestingly, the models demonstrated partial recovery of lost skills over time.

For example, after training on a counting task, the models initially showed decreased accuracy on a separate benchmark (PathVQA), which involves specialized visual question answering. However, performance on this task rebounded, suggesting that the forgetting effect was not permanent.

Further experiments involved tuning only specific layers-such as the self-attention projection (SA Proj) or the multi-layer perceptron (MLP) components-rather than the entire model. Remarkably, adjusting just the self-attention layers enabled effective learning of new tasks without compromising performance on previously learned ones, even after sequential training on multiple tasks.

These observations led to the conclusion that the apparent forgetting is actually a bias in the model’s output distribution caused by shifts in the types of tasks it encounters, rather than true erasure of knowledge.

Targeted Fine-Tuning: A Cost-Effective Solution

The key insight from this research is that fine-tuning narrow parts of the model, particularly the MLP’s up/gating projections while freezing the down projections, can achieve comparable learning outcomes to full MLP tuning but with minimal forgetting. This approach helps maintain output stability and reduces the risk of the model disproportionately generating numeric tokens, which was linked to accuracy drops in other tasks.

By concentrating retraining efforts on select model segments, enterprises can significantly lower computational demands and better manage output consistency. This method offers a more streamlined and reproducible fine-tuning process, making it attractive for practical deployment.

While the study’s experiments were limited to two vision-language models due to resource constraints, the principles uncovered are likely applicable across a broader range of LLMs and modalities, including text-only or audio-visual models.

Implications for Future Model Adaptation

As the demand for customized AI solutions grows, strategies that mitigate catastrophic forgetting without extensive retraining will become increasingly valuable. This research highlights the importance of understanding internal model dynamics and leveraging targeted tuning to preserve prior knowledge while integrating new capabilities.

With the rapid evolution of LLMs and their expanding use cases, adopting efficient fine-tuning techniques not only reduces operational costs but also supports sustainable AI development by minimizing environmental impact.

More from this stream

Recomended