Google breaks through the bottleneck of AI forgetting! "Nested learning" allows models to continuously evolve like the human brain

Recently, Google published a paper titled "Nested Learning: The Illusion of Deep Learning Architectures" at NeurIPS 2025, proposing a novel "nested learning" paradigm that fundamentally rethinks how AI learns. This breakthrough may mark a crucial step towards AI "truly evolving like the brain."

01 Solving the AI "Catastrophic Forgetting" Problem

Current large language models perform impressively, but their knowledge accumulation has a clear ceiling. Their knowledge is either limited to pre-training data or constrained by a limited context window, unable to continuously learn new skills without forgetting old knowledge through neuroplasticity like the human brain.

The most direct solution—continuously updating model parameters with new data—often leads to "catastrophic forgetting," where the model's performance on old tasks deteriorates significantly after learning new tasks.

Traditionally, researchers have mitigated this problem through two approaches: improving model architecture or optimizing training algorithms; however, these two methods have been viewed as separate parts, and this fragmented perspective has hindered the establishment of a unified and efficient learning system.

Google's nested learning paradigm breaks this conventional thinking, viewing model architecture and optimization algorithms as a unified, nested system, addressing the forgetting problem at a more fundamental level.

02 Nested Learning: Enabling AI to Learn at Multiple Scales Like the Human Brain

The core idea of nested learning is quite simple: a complex machine learning model is essentially a set of nested or parallel optimization problems, each with its own context flow and update frequency. This is similar to the human brain's learning mechanism. Our instantaneous memory of immediate information updates extremely quickly, short-term memory for exam preparation updates at a slightly slower pace, while long-term knowledge that constitutes our worldview updates very slowly. Nested learning applies this insight to AI, introducing the key concept: update frequency. Each component in the model, whether it's the weight parameters or the momentum term in the optimizer, has its own unique update frequency. The slowest-updating component is responsible for accumulating long-term, stable, and abstract knowledge into the parameters, forming the model's "worldview." This multi-rate learning system allows the model to flexibly absorb new information without interfering with core knowledge.

03 Two Major Technological Innovations: Deep Optimizer and Continuous Memory System

Based on the nested learning paradigm, the Google research team proposed two core technological improvements, providing a practical path for theory.

Deep Optimizer: Enabling Learning Capabilities in the Optimization Process

Nested learning treats the optimizer itself as a learnable "associative memory module." Traditional optimizers rely on simple dot product similarity, failing to consider the complex relationships between different data samples. By adjusting the optimization objective to a more standardized loss metric, a new momentum formula can be derived, making the optimizer more robust to noisy data. This "deep optimizer" can more intelligently guide the entire model's learning process.

Continuous Memory System: Achieving Dynamic Memory Management

In the traditional Transformer, the sequence model acts as short-term memory, preserving immediate context; the feedforward network acts as long-term memory, storing pre-trained knowledge. Nested learning extends this concept to a continuous memory system, where memory is viewed as a spectrum composed of a series of modules, each updated at a specific frequency, creating a richer and more efficient memory system for continuous learning. This design enables the model to dynamically allocate and manage knowledge across memory modules at different time scales based on the importance and stability of information, fundamentally solving the problem of catastrophic forgetting.

04 Hope Validation Model: Significantly Outperforming Existing Architectures

To validate nested learning theory, Google developed the Hope validation model—a self-modifying recurrent network based on the Titans architecture, integrating a continuum memory system.

Experimental results show that Hope exhibits lower perplexity and higher accuracy in language modeling and commonsense reasoning tasks, outperforming modern recurrent models and the standard Transformer.

Especially in the long-context "needle-in-a-haystack" task, Hope demonstrates superior memory management capabilities, proving that CMS provides a more efficient method for processing extended information sequences.

These results not only validate the effectiveness of the nested learning paradigm but also demonstrate its enormous potential in handling complex tasks.

05 Implications of Nested Learning for AI Development

The continuous learning capability brought by nested learning enables AI systems to quickly adapt to the localized needs of different markets without forgetting core knowledge. For example, an AI customer service system that performs well in the European and American markets can quickly learn the more polite and conversational communication styles of the East Asian market through continuous learning.

From a technical perspective, the nested learning paradigm aligns perfectly with the "scenario-centric" development path of domestic AI companies. Whether it's the overseas returnees in Guizhou's big data industry attempting to combine international technology with local needs, or the development trend of AI companies in various fields delving into vertical industries, nested learning provides strong technical support.

For domestic AI hardware companies, the efficient and continuous learning capabilities under the nested learning framework help better balance global standards and localized needs when exporting products, creating truly globally competitive products in areas such as smart homes and consumer electronics.

From academic research to industrial application, nested learning may take several years. However, it is certain that AI is transforming from a "static knowledge base" that can only passively reflect the statistical patterns of training data into an organism capable of continuous self-improvement and adaptation to the environment.

Google's breakthrough in nested learning is not only a technological milestone but also a significant step towards the development of AI towards general artificial intelligence. As this paradigm matures, application areas requiring lifelong learning, such as autonomous driving, personalized medical assistants, and adaptive education systems, will usher in entirely new possibilities.

References:

https://openreview.net/forum?id=nbMeRvNb7A;

https://research.google/blog/introducing-nested-learning-a-new-ml-paradigm-for-continual-learning/