From Response to Query: The Power of Reverse Thinking in Language Models

January 16, 2025

Recent advancements in large language models (LLMs) have primarily focused on enhancing their capacity to predict text in a forward, time-linear manner. However, emerging research suggests that enabling LLMs to critique and refine their own outputs retrospectively can significantly improve their performance. While effective, existing methods rely on the advanced reasoning and instruction-following abilities inherent to high-capacity LLMs. Moreover, these approaches often involve sequential processing of generated responses, resulting in considerable increases in inference time.

In a new paper Time-Reversal Provides Unsupervised Feedback to LLMs, a research team from Google DeepMind and Indian Institute of Science proposes Time Reversed Language Models (TRLMs), a framework that allows LLMs to reason in reverse—scoring and generating content in a manner opposite to the traditional forward approach. Unlike conventional LLMs, which predict responses based on queries, TRLMs predict or evaluate queries based on responses, thereby facilitating unsupervised feedback during inference.

The researchers present two key variants of TRLMs. The first, called TRLM-Fo (“Forward-based”), repurposes existing forward-trained LLMs to operate in a reverse manner. This is achieved by using prompts like “Generate a question that would result in the following answer:” to guide the model’s behavior. The second variant, TRLM-Ba (“Backward”), takes a more fundamental approach by pre-training LLMs from scratch in a token-reversed direction. Instead of learning in the conventional forward direction, these models learn to predict tokens in reverse, allowing for a more natural capacity for backward reasoning.

The study’s findings reveal that TRLMs deliver meaningful unsupervised feedback that can enhance the performance of pre-trained, fine-tuned, and instruction-tuned models. Applications of TRLMs span a variety of downstream tasks, including reranking responses for open-ended long-form question answering, citation generation, and information retrieval. Crucially, the researchers demonstrate that the reverse-scoring capability of TRLMs—where the model scores a query based on a response—is instrumental in achieving these gains. Additionally, models trained using the TRLM-Ba approach generally outperform their TRLM-Fo counterparts, underscoring the value of native backward pre-training.

Empirical results highlight the effectiveness of TRLMs in real-world applications. On the widely used AlpacaEval Leaderboard, TRLMs achieve up to a 5% improvement over a strong baseline that relies on self log-perplexity scores for best-of-N reranking. Notably, TRLMs outperform the conventional approach of forward scoring (query → response) in crucial tasks such as citation generation and passage retrieval.

Beyond reranking and retrieval, the researchers leverage TRLM’s generative abilities to strengthen the input safety filters of LLMs. By generating potential queries from known responses, TRLMs help identify unsafe inputs more effectively. This approach led to a dramatic reduction in the false negative rate on the JailbreakBench leaderboard, a benchmark for assessing LLM safety. Importantly, this improvement was achieved without significantly increasing the false positive rate, showcasing the method’s robustness against adversarial inputs.

In summary, Time Reversed Language Models (TRLMs) offer a paradigm shift in how LLMs generate, rank, and evaluate content. By enabling reverse reasoning and scoring, TRLMs introduce a novel form of unsupervised feedback that can boost the performance of both existing and newly trained models. Their effectiveness in reranking, retrieval, and safety filtering positions them as a promising addition to the LLM toolkit, paving the way for faster and more efficient language model deployments.

The paper Time-Reversal Provides Unsupervised Feedback to LLMs is on .

Author: Hecate He | Editor: Chain Zhang

The post first appeared on .