Large reasoning models almost certainly can think

    0

    Recently, there has been considerable debate surrounding the notion that large reasoning models (LRMs) lack genuine thinking capabilities. This skepticism largely stems from a study released by Apple, which contends that LRMs merely engage in pattern recognition rather than true reasoning. Their argument hinges on the observation that LRMs struggle to execute step-by-step algorithms, such as those required for complex calculations, as problem size increases.

    However, this line of reasoning is fundamentally flawed. For example, if you asked a person familiar with the Tower of Hanoi algorithm to solve a version with twenty discs, they would likely fail due to the problem’s complexity. By that logic, one would have to conclude that humans are incapable of thinking, which is clearly absurd. Instead, this critique only highlights that there is no definitive proof LRMs cannot think. It does not, however, confirm that they do.

    In this discussion, I will argue more assertively: LRMs almost certainly possess the capacity to think. While there remains a possibility that future research might challenge this view, the evidence and reasoning presented here strongly support this conclusion.

    Defining Thought: What Does It Mean to Think?

    Before assessing whether LRMs can think, it is essential to clarify what “thinking” entails, particularly in the context of problem-solving. We must first ensure that our definition aligns with human cognitive processes.

    1. Structuring the Problem (Role of Frontal and Parietal Lobes)

    When humans contemplate a problem, the prefrontal cortex is heavily involved. This brain region manages working memory, attention, and executive functions, enabling individuals to hold the problem in mind, decompose it into manageable parts, and establish objectives. Meanwhile, the parietal cortex encodes symbolic structures, which are crucial for mathematical and puzzle-related reasoning.

    2. Mental Simulation: Inner Dialogue and Visualization

    Thinking also involves two key components: an internal auditory loop that facilitates self-talk, akin to an inner voice, and visual imagery that allows mental manipulation of objects. For instance, spatial reasoning skills evolved to help humans navigate their environment effectively. The auditory aspect is linked to language centers such as Broca’s area and the auditory cortex, while the visual component is governed by the visual cortex and parietal regions.

    3. Pattern Recognition and Memory Retrieval (Hippocampus and Temporal Lobes)

    These cognitive functions rely on stored knowledge and past experiences:

    • The hippocampus retrieves relevant memories and factual information.
    • The temporal lobes contribute semantic understanding, including meanings, rules, and categories.

    This mirrors how neural networks depend on their training data to process tasks.

    4. Error Detection and Self-Monitoring (Anterior Cingulate Cortex)

    The anterior cingulate cortex (ACC) plays a critical role in monitoring for mistakes, conflicts, or impasses. It helps identify contradictions or dead ends during problem-solving, a process grounded in pattern recognition from prior experience.

    5. Insight and Reframing (Default Mode Network and Right Hemisphere)

    When stuck, the brain often shifts into a more relaxed, introspective state known as the default mode network. This mental mode allows for stepping back and sometimes experiencing sudden insights or “aha” moments. This phenomenon is comparable to how certain models, like DeepSeek-R1, were trained to perform chain-of-thought (CoT) reasoning without explicit CoT examples in their training data. The brain, like these models, continuously learns and adapts while processing information.

    Unlike biological brains, LRMs typically do not update their parameters during inference based on real-time feedback. However, models such as DeepSeek-R1 demonstrate that learning can occur during problem-solving attempts, effectively updating internal states as reasoning progresses.

    Parallels Between Chain-of-Thought Reasoning and Human Cognition

    While LRMs lack some human faculties-such as extensive visual reasoning-they do exhibit behaviors analogous to human thought processes. For example, most people can mentally visualize spatial models to solve problems, but some individuals with a condition called aphantasia cannot form mental images. Despite this, they think effectively and often excel in symbolic reasoning and mathematics, compensating for their lack of visual imagination. This suggests that LRMs might similarly overcome limitations in certain cognitive domains.

    Abstracting the human thought process, we identify three core components:

    1. Pattern matching for recalling experiences, representing problems, and evaluating reasoning steps.
    2. Working memory to hold intermediate steps during problem-solving.
    3. Backtracking to abandon unproductive lines of thought and explore alternatives.

    In LRMs, pattern matching is central. Training enables these models to internalize both world knowledge and the patterns necessary to manipulate that knowledge effectively. Working memory corresponds to the model’s attention layers, where all input, intermediate reasoning steps, and partial outputs must fit simultaneously. The model’s parameters encode learned knowledge and processing strategies.

    Chain-of-thought reasoning in LRMs closely resembles human inner speech, where we verbalize our thoughts to ourselves. Moreover, evidence shows that LRMs can backtrack when a reasoning path proves unfruitful. For instance, when Apple’s researchers tested LRMs on larger puzzle instances, the models recognized the limitations of their working memory and sought more efficient shortcuts-demonstrating adaptive problem-solving rather than blind pattern following.

    Why Would a Next-Token Predictor Develop Thinking Abilities?

    Critics often dismiss LRMs as mere “glorified auto-completes” that predict the next word without genuine understanding. This perspective is misleading. Next-token prediction is, in fact, a highly general and powerful form of knowledge representation.

    Representing knowledge requires a symbolic system or language. Formal languages like first-order predicate logic are precise but limited in expressiveness. For example, they cannot represent properties of predicates themselves. Higher-order logics extend this capacity but still fall short of capturing abstract or imprecise concepts.

    Natural language, by contrast, is extraordinarily expressive. It can describe any concept at any level of detail or abstraction, including meta-concepts about language itself. This makes natural language an ideal medium for representing complex knowledge.

    Although natural language’s richness complicates processing, machines can learn to interpret it through training on vast datasets. A next-token prediction model estimates the probability distribution of the next word given the preceding context. To do this accurately, the model must internally encode extensive world knowledge.

    For example, to complete the sentence “The highest mountain peak in the world is Mount …,” the model must “know” that “Everest” is the correct continuation. When tasked with solving puzzles or computations, the model generates intermediate reasoning steps (CoT tokens) to maintain logical coherence.

    Thus, despite generating one token at a time, the model effectively maintains a working memory of upcoming tokens to stay on track. Humans similarly predict upcoming words during speech and inner dialogue. A perfect next-token predictor that consistently produces correct answers would, in theory, possess near-omniscient knowledge-though such perfection is unattainable.

    Nevertheless, a parameterized model capable of learning from data and reinforcement can develop genuine reasoning abilities.

    Does LRM Output Reflect Genuine Thought?

    The ultimate measure of thinking is a system’s ability to solve novel problems requiring reasoning. If a model can answer previously unseen questions that demand logical inference, it demonstrates at least a form of reasoning.

    Proprietary LRMs have shown impressive performance on various reasoning benchmarks. To ensure transparency and fairness, we focus here on open-source models, which have also achieved notable success on logic-based tasks.

    While these models often lag behind expert human performance, it is important to recognize that human baselines typically come from individuals trained specifically on these benchmarks. In many cases, LRMs outperform average untrained humans, underscoring their reasoning capabilities.

    Final Thoughts

    Considering benchmark achievements, the striking parallels between chain-of-thought reasoning and human cognition, and the theoretical foundation that any sufficiently capable system with adequate data and computational resources can perform complex tasks, it is reasonable to conclude that LRMs almost certainly possess the ability to think.

    Exit mobile version