Beyond Next-Token Prediction? Meta’s Novel Architectures Spark Debate on the Future of Large Language Models

May 8, 2025

A pair of groundbreaking research initiatives from Meta AI in late 2024 is challenging the fundamental “next-token prediction” paradigm that underpins most of today’s large language models (LLMs). The introduction of the BLT (Byte-Level Transformer) architecture, which eliminates the need for tokenizers and demonstrates significant potential in multimodal alignment and fusion, coincided with the unveiling of the Large Concept Model (LCM). The LCM takes a radical step further by also discarding tokens, aiming to bridge the gap between symbolic and connectionist AI by enabling direct reasoning and generation in a semantic “concept” space. These developments have ignited discussions within the AI community, with many suggesting they could represent a new era for LLM design.

The research from Meta explores the latent space of models, seeking to revolutionize their internal representations and facilitate reasoning processes more aligned with human cognition. This exploration stems from the observation that current LLMs, both open and closed source, lack an explicit hierarchical structure for processing and generating information at an abstract level, independent of specific languages or modalities.

The prevailing “next-token prediction” approach in traditional LLMs gained traction largely due to its relative ease of engineering implementation and its demonstrated effectiveness in practice. This method addresses the necessity for computers to process discrete numerical representations of text, with tokens serving as the simplest and most direct way to achieve this conversion into vectors for mathematical operations. Ilya Sutskever, in a conversation with Jensen Huang, previously suggested that predicting the next word allows models to grasp the underlying real-world processes and emotions, leading to the formation of a “world model.”

However, critics argue that using a discrete symbolic system to capture the continuous and complex nature of human thought is inherently flawed, as humans do not think in tokens. Human problem-solving and long-form content creation often involve a hierarchical approach, starting with a high-level plan of the overall structure before gradually adding details. For instance, when preparing a speech, individuals typically outline core arguments and the flow, rather than pre-selecting every word. Similarly, writing a paper involves creating a framework with chapters that are then progressively elaborated upon. Humans can also recognize and remember the relationships between different parts of a lengthy document at an abstract level.

Meta’s LCM directly addresses this by enabling models to learn and reason at an abstract conceptual level. Instead of tokens, both the input and output of the LCM are “concepts.” This approach has demonstrated superior zero-shot cross-lingual generalization capabilities compared to other LLMs of similar size, generating significant excitement within the industry.

Yuchen Jin, CTO of Hyperbolic, commented on social media that he is increasingly convinced tokenization will disappear, with LCM replacing “next-token prediction” with “next-concept prediction.” He intuitively believes LCM may excel in reasoning and multimodal tasks. The LCM has also sparked considerable discussion among Reddit users, who view it as a potential new paradigm for AI cognition and eagerly anticipate the synergistic effects of combining LCM with Meta’s other initiatives like BLT, JEPA, and Coconut.

How Does LCM Learn Abstract Reasoning Without Predicting the Next Token?

The core idea behind LCM is to perform language modeling at a higher level of abstraction, adopting a “concept-centric” paradigm. LCM operates with two defined levels of abstraction: subword tokens and concepts. A “concept” is defined as a language and modality-agnostic abstract entity representing a higher-level idea or action, typically corresponding to a sentence in a text document or an equivalent spoken utterance. In essence, LCM learns “concepts” directly, using a transformer to convert sentences into sequences of concept vectors instead of token sequences for training.

To train on these higher-level abstract representations, LCM utilizes SONAR, a previously developed Meta model for multilingual and multimodal sentence embeddings, as a translation tool. SONAR converts tokens into concept vectors (and vice versa), allowing LCM’s input and output to be concept vectors, enabling direct learning of higher-level semantic relationships. While SONAR acts as a bridge between tokens and concepts (and is not involved in training), the researchers explored three model architectures capable of processing these “concept” units: Base-LCM, Diffusion-based LCM, and Quantized LCM.

Base-LCM, the foundational architecture, employs a standard decoder-only Transformer model to predict the next concept (sentence embedding) in the embedding space. Its objective is to directly minimize the Mean Squared Error (MSE) loss to regress the target sentence embedding. SONAR serves as both a PreNet and PostNet to normalize input and output embeddings. The Base-LCM workflow involves segmenting input into sentences, encoding each sentence into a concept sequence (sentence vector) using SONAR, processing this sequence with LCM to generate a new concept sequence, and finally decoding the generated concepts back into a subword token sequence using SONAR. While structurally clear and relatively stable to train, this approach risks information loss as all semantic information must pass through the intermediate concept vectors.

Quantized LCM addresses continuous data generation by discretizing it. This architecture uses Residual Vector Quantization (RVQ) to quantize the concept layer provided by SONAR and then models the discrete units. By using discrete representations, Quantized LCM can reduce computational complexity and offers advantages in processing long sequences. However, mapping continuous embeddings to discrete codebook units can potentially lead to information loss or distortion, impacting accuracy.

Diffusion-based LCM, inspired by diffusion models, is modeled as an autoregressive model that generates concepts sequentially within a document. In this approach, a diffusion model is used to generate sentence embeddings. Two main variations were explored:

One-Tower Diffusion LCM: This model uses a single Transformer backbone tasked with predicting clean sentence embeddings given noisy inputs. It trains effectively by alternating between clean and noisy embeddings.
Two-Tower Diffusion LCM: This separates the encoding of the context from the diffusion of the next embedding. The first model (contextualizer) causally encodes context vectors, while the second model (denoiser) predicts clean sentence embeddings through iterative denoising.

Among the explored variations, the Two-Tower Diffusion LCM’s separated structure allows for more efficient handling of long contexts and leverages cross-attention during denoising to utilize contextual information, demonstrating superior performance in abstract summarization and long-context reasoning tasks.

What Future Possibilities Does LCM Unlock?

Meta’s Chief AI Scientist and FAIR Director, Yann LeCun, described LCM in a December interview as the blueprint for the next generation of AI systems. LeCun envisions a future where goal-driven AI systems possess emotions and world models, with LCM being a crucial component in realizing this vision.

LCM’s mechanism of encoding entire sentences or paragraphs into high-dimensional vectors and directly learning and outputting concepts enables AI models to think and reason at a higher level of abstraction, similar to humans, thereby unlocking more complex tasks.

Alongside LCM, Meta also released BLT and Coconut, both representing explorations into the latent space. BLT eliminates the need for tokenizers by processing bytes into dynamically sized patches, allowing different modalities to be represented as bytes and making language model understanding more flexible. Coconut (Chain of Continuous Thought) modifies the latent space representation to enable models to reason in a continuous latent space.

Meta’s series of innovations in latent space has sparked a significant debate within the AI community regarding the potential synergies between LCM, BLT, Coconut, and Meta’s previously introduced JEPA (Joint Embedding Predictive Architecture).

An analysis on Substack suggests that the BLT architecture could serve as a scalable encoder and decoder within the LCM framework. Yuchen Jin echoed this sentiment, noting that while LCM’s current implementation relies on SONAR, which still uses token-level processing to develop the sentence embedding space, he is eager to see the outcome of a LCM+BLT combination. Reddit users have speculated about future robots conceptualizing daily tasks through LCM, reasoning about tasks with Coconut, and adapting to real-world changes via JEPA.

These developments from Meta signal a potential paradigm shift in how large language models are designed and trained, moving beyond the established “next-token prediction” approach towards more abstract and human-like reasoning capabilities. The AI community will be closely watching the further development and integration of these novel architectures.

The paper Large Concept Models: Language Modeling in a Sentence Representation Space is on .

The post first appeared on .

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

RELATED ARTICLES

Off-Policy Reinforcement Learning RL with KL Divergence Yields Superior Reasoning in...

LuminX Secures $5.5M to Make Warehousing Intelligent with Vision Language Models...

ElevenLabs debuts Conversational AI 2.0 voice assistants that understand when to...