Google’s ‘Nested Learning’ paradigm could solve AI’s memory and continual learning problem

November 22, 2025

Google researchers have introduced an innovative AI framework designed to address a critical shortcoming in contemporary large language models (LLMs): their inability to update or expand their knowledge base after the initial training phase. This novel approach, termed Nested Learning, reconceptualizes model training as a hierarchy of interlinked optimization challenges rather than a singular, linear process. The team suggests that this multi-tiered optimization can unlock more sophisticated learning algorithms, enhancing both in-context learning capabilities and memory retention.

To validate their theory, the researchers developed a new model named Hope, built upon the Nested Learning framework. Early testing reveals that Hope excels in language modeling, continual learning, and reasoning over extended contexts, indicating a promising path toward AI systems that dynamically adapt to evolving real-world scenarios.

Understanding the Memory Constraints of Large Language Models

Large language models revolutionized machine learning by reducing the need for meticulous feature engineering and domain-specific expertise. By training on massive datasets, these models autonomously learn complex representations. However, simply increasing model size or depth has not resolved persistent issues such as generalizing to unseen data, continual task learning, and avoiding suboptimal training outcomes.

These challenges spurred the development of foundational architectures that underpin today’s LLMs, marking a shift from narrowly focused models to versatile systems exhibiting emergent abilities through scaling. Despite these advances, a fundamental limitation persists: once training concludes, LLMs remain static, unable to incorporate new knowledge or skills from ongoing interactions.

Currently, the only adaptable aspect of an LLM is its in-context learning, which allows it to respond based on information provided within the immediate prompt. This is akin to a person who can recall recent conversations but cannot form lasting memories. The model’s knowledge is confined to what was encoded during pre-training and the transient context window. When the conversation exceeds this window, earlier information is irretrievably lost.

This limitation arises because transformer-based LLMs lack mechanisms for “online” learning-where new information can be integrated into the model’s long-term parameters. The weights in the feed-forward layers remain fixed post-training, preventing the model from permanently acquiring new knowledge or skills from interactions. Consequently, any learning that occurs within the context window vanishes once it moves beyond that scope.

Nested Learning: A Multi-Level Optimization Framework

Nested Learning introduces a paradigm inspired by the brain’s ability to process information across multiple abstraction levels and timescales. Instead of viewing a machine learning model as a single continuous process, Nested Learning treats it as a system of interconnected learning problems optimized concurrently but at varying speeds. This contrasts with traditional approaches that separate model architecture from the optimization algorithm.

Within this framework, training is conceptualized as building an “associative memory”-the capacity to link and retrieve related information. The model learns to associate each data point with a local error signal, reflecting how unexpected that input is. Even core components like the attention mechanism in transformers can be interpreted as associative memory modules that map relationships between tokens. By assigning distinct update frequencies to different components, these nested optimization tasks form hierarchical levels, which constitute the essence of Nested Learning.

Hope: Advancing Continual Learning with a Hierarchical Memory System

Applying these principles, the researchers created Hope, an architecture embodying Nested Learning. Hope evolves from a prior model designed to mitigate transformer memory limitations but extends it by introducing a “Continuum Memory System” (CMS). Unlike its predecessor, which operated with only two memory update speeds, Hope’s CMS supports an unlimited number of memory banks, each updating at its own pace.

This design allows rapid memory banks to process immediate inputs, while slower banks consolidate abstract knowledge over extended periods. The model effectively self-optimizes its memory through a recursive loop, theoretically enabling infinite layers of learning and memory consolidation.

Hope’s performance on diverse language modeling and reasoning benchmarks is impressive. It achieves lower perplexity-a key metric indicating better prediction and coherence-and higher accuracy than both standard transformers and advanced recurrent models. Notably, Hope excels in long-context “Needle-In-Haystack” tasks, where it must identify and utilize specific information buried within extensive text, demonstrating the CMS’s efficiency in managing lengthy sequences.

Hope’s approach aligns with other recent efforts to enhance AI memory and reasoning. For example, Sapient Intelligence’s Hierarchical Reasoning Model (HRM) employs a layered architecture to improve reasoning efficiency, while Samsung’s Transformer Reasoning Model (TRM) refines HRM’s design to boost performance and efficiency.

Despite its promise, Nested Learning faces hurdles similar to those encountered by other advanced architectures. Current AI hardware and software ecosystems are heavily optimized for conventional deep learning and transformer models, making widespread adoption of Nested Learning challenging. Nevertheless, if successfully integrated, this paradigm could revolutionize LLMs by enabling continuous learning-a critical feature for enterprise applications where data, environments, and user demands evolve rapidly.