News

Building a Context-Folding LLM Agent for Long-Horizon Reasoning with Memory Compression and Tool Use

October 16, 2025

Building a Context-Folding LLM Agent for Efficient Complex Task Management

This guide walks you through creating a Context-Folding Large Language Model (LLM) Agent designed to tackle extensive, multifaceted tasks by smartly handling limited context windows. The agent decomposes a large problem into manageable subtasks, applies reasoning or calculations as necessary, and then compresses each completed segment into brief summaries. This approach retains critical information while keeping the working memory compact and efficient.

Setting Up the Environment with a Lightweight Hugging Face Model

We start by configuring our environment and loading a compact Hugging Face model locally. This setup ensures smooth operation on platforms like Google Colab without relying on external APIs, enabling offline text generation and processing.

import os, sys, subprocess
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

MODEL_NAME = os.environ.get("CF_MODEL", "google/flan-t5-small")

# Install dependencies if missing
try:
    import transformers
except ImportError:
    subprocess.run([sys.executable, "-m", "pip", "install", "-q", "transformers", "accelerate", "sentencepiece"], check=True)

tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForSeq2SeqLM.from_pretrained(MODEL_NAME)
llm = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device_map="auto")

def generate_text(prompt: str, max_tokens=160, temperature=0.0) -> str:
    result = llm(prompt, max_new_tokens=max_tokens, do_sample=temperature > 0.0, temperature=temperature)[0]["generated_text"]
    return result.strip()

Implementing a Basic Calculator and Dynamic Folding Memory

To support arithmetic operations within subtasks, we define a safe calculator using Python’s abstract syntax trees. Alongside, we create a folding memory system that actively compresses older context into concise summaries, maintaining a manageable active memory size without losing essential details.

import ast
import operator as op

# Supported arithmetic operations
OPS = {
    ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul, ast.Div: op.truediv,
    ast.Pow: op.pow, ast.USub: op.neg, ast.FloorDiv: op.floordiv, ast.Mod: op.mod
}

def evaluate_expression(node):
    if isinstance(node, ast.Num):
        return node.n
    elif isinstance(node, ast.UnaryOp) and type(node.op) in OPS:
        return OPS[type(node.op)](evaluate_expression(node.operand))
    elif isinstance(node, ast.BinOp) and type(node.op) in OPS:
        return OPS[type(node.op)](evaluate_expression(node.left), evaluate_expression(node.right))
    else:
        raise ValueError("Unsupported or unsafe expression")

def calculate(expr: str):
    parsed = ast.parse(expr, mode='eval').body
    return evaluate_expression(parsed)

class FoldingMemory:
    def __init__(self, max_chars=800):
        self.active = []
        self.folds = []
        self.max_chars = max_chars

    def add(self, text: str):
        self.active.append(text.strip())
        while len(self.get_active_text()) > self.max_chars and len(self.active) > 1:
            removed = self.active.pop(0)
            summary = f"- Folded: {removed[:120]}..."
            self.folds.append(summary)

    def fold_in(self, summary: str):
        self.folds.append(summary.strip())

    def get_active_text(self) -> str:
        return "n".join(self.active)

    def get_folded_text(self) -> str:
        return "n".join(self.folds)

    def snapshot(self) -> dict:
        return {
            "active_chars": len(self.get_active_text()),
            "fold_count": len(self.folds)
        }

Crafting Prompt Templates for Task Decomposition and Summarization

We develop structured prompt templates that instruct the agent to break down tasks into subtasks, solve them efficiently, and summarize results. These templates ensure clear, stepwise communication between the agent’s reasoning process and the language model’s outputs.

SUBTASK_DECOMPOSITION_PROMPT = """
You are a skilled planner. Break down the following task into 2 to 4 clear subtasks.
List each subtask as a bullet point starting with '- ', ordered by priority.
Task: "{task}"
"""

SUBTASK_SOLVER_PROMPT = """
You are a focused problem solver aiming for minimal steps.
If a calculation is required, respond with one line: 'CALC(expression)'.
Otherwise, reply with 'ANSWER: <final result>'.
Think briefly and avoid unnecessary dialogue.

Task: {task}
Subtask: {subtask}
Context notes (folded summaries):
{notes}

Please respond with either CALC(...) or ANSWER: ...
"""

SUBTASK_SUMMARY_PROMPT = """
Summarize the subtask result in up to 3 bullet points, totaling no more than 50 tokens.
Subtask: {name}
Steps taken:
{trace}
Final result: {final}

Return only bullet points starting with '- '.
"""

FINAL_SYNTHESIS_PROMPT = """
You are an expert agent. Create a concise, coherent final solution using ONLY:
- The original task description
- Folded summaries below

Avoid repeating intermediate steps. Be clear and actionable.

Task: {task}
Folded summaries:
{folds}

Final answer:
"""

def extract_bullets(text: str) -> list:
    return [line[2:].strip() for line in text.splitlines() if line.strip().startswith("- ")]

Executing Subtasks and Folding Results into Memory

The agent’s core functionality involves running each subtask, interpreting the model’s output, performing calculations when prompted, and summarizing the results. These summaries are then folded back into memory, enabling iterative reasoning without losing track of previous context.

import re
import time
from typing import List, Tuple, Dict

def execute_subtask(task: str, subtask: str, memory: FoldingMemory, max_iterations=3) -> Tuple[str, str, List[str]]:
    notes = memory.get_folded_text() or "(none)"
    trace = []
    final_answer = ""

    for _ in range(max_iterations):
        prompt = SUBTASK_SOLVER_PROMPT.format(task=task, subtask=subtask, notes=notes)
        response = generate_text(prompt, max_tokens=96)
        trace.append(response)

        calc_match = re.search(r"CALC((.+?))", response)
        if calc_match:
            try:
                result = calculate(calc_match.group(1))
                trace.append(f"TOOL: CALC -> {result}")
                follow_up = generate_text(prompt + f"nTool result: {result}nNow provide 'ANSWER: ...' only.", max_tokens=64)
                trace.append(follow_up)
                if follow_up.strip().startswith("ANSWER:"):
                    final_answer = follow_up.split("ANSWER:", 1)[1].strip()
                    break
            except Exception as e:
                trace.append(f"TOOL: CALC ERROR -> {e}")
        elif response.strip().startswith("ANSWER:"):
            final_answer = response.split("ANSWER:", 1)[1].strip()
            break

    if not final_answer:
        final_answer = "No definitive answer; partial reasoning:n" + "n".join(trace[-2:])

    summary = generate_text(SUBTASK_SUMMARY_PROMPT.format(name=subtask, trace="n".join(trace), final=final_answer), max_tokens=80)
    summary_bullets = "n".join(extract_bullets(summary)[:3]) or f"- {subtask}: {final_answer[:60]}..."
    return final_answer, summary_bullets, trace

class ContextFoldingAgent:
    def __init__(self, max_active_chars=800):
        self.memory = FoldingMemory(max_chars=max_active_chars)
        self.metrics = {"subtasks": 0, "tool_calls": 0, "chars_saved_estimate": 0}

    def decompose_task(self, task: str) -> List[str]:
        plan = generate_text(SUBTASK_DECOMPOSITION_PROMPT.format(task=task), max_tokens=96)
        subtasks = extract_bullets(plan)
        return subtasks[:4] if subtasks else ["Main solution"]

    def run(self, task: str) -> Dict:
        start_time = time.time()
        self.memory.add(f"TASK: {task}")
        subtasks = self.decompose_task(task)
        self.metrics["subtasks"] = len(subtasks)
        folded_summaries = []

        for subtask in subtasks:
            self.memory.add(f"SUBTASK: {subtask}")
            final, summary, trace = execute_subtask(task, subtask, self.memory)
            self.memory.fold_in(summary)
            folded_summaries.append(f"- {subtask}: {final}")
            self.memory.add(f"SUBTASK_DONE: {subtask}")

        final_answer = generate_text(FINAL_SYNTHESIS_PROMPT.format(task=task, folds=self.memory.get_folded_text()), max_tokens=200)
        end_time = time.time()

        return {
            "task": task,
            "final_answer": final_answer.strip(),
            "folded_summaries": self.memory.get_folded_text(),
            "active_context_length": len(self.memory.get_active_text()),
            "subtask_results": folded_summaries,
            "runtime_seconds": round(end_time - start_time, 2)
        }

Demonstrating the Agent with Practical Examples

To illustrate the agent’s capabilities, we test it on sample tasks that require planning and budget calculations. These examples showcase how the agent decomposes, reasons, and synthesizes results while efficiently managing context.

DEMO_TASKS = [
    "Create a 3-day study plan for machine learning including daily exercise and simple meal prep; allocate specific time blocks.",
    "Calculate a project budget with three items: laptop $799.99, course $149.50, snacks $23.75; add 8% tax and 5% contingency; provide a concise recommendation paragraph."
]

import json

def pretty_print(data):
    return json.dumps(data, indent=2, ensure_ascii=False)

if __name__ == "__main__":
    agent = ContextFoldingAgent(max_active_chars=700)
    for idx, task in enumerate(DEMO_TASKS, 1):
        print("=" * 70)
        print(f"DEMO #{idx}: {task}")
        result = agent.run(task)
        print("n--- Folded Summaries ---n" + (result["folded_summaries"] or "(none)"))
        print("n--- Final Answer ---n" + result["final_answer"])
        print("n--- Diagnostics ---")
        diagnostics = {
            "active_context_length": result["active_context_length"],
            "runtime_seconds": result["runtime_seconds"],
            "number_of_subtasks": len(agent.decompose_task(task))
        }
        print(pretty_print(diagnostics))

Summary: Efficient Long-Horizon Reasoning via Context Folding

This tutorial demonstrates how context folding empowers an LLM agent to perform extended reasoning without overwhelming memory constraints. By systematically decomposing tasks, executing subtasks with tool support, and compressing knowledge into summaries, the agent mimics intelligent workflow management. This method combines decomposition, tool integration, and context compression to build a lightweight yet scalable reasoning system capable of handling complex, multi-step problems effectively.

Loading…

Here are the results for the search: "{{td_search_query}}"

No results!

{{post_title}}

Building a Context-Folding LLM Agent for Efficient Complex Task Management

Setting Up the Environment with a Lightweight Hugging Face Model

Implementing a Basic Calculator and Dynamic Folding Memory

Crafting Prompt Templates for Task Decomposition and Summarization

Executing Subtasks and Folding Results into Memory

Demonstrating the Agent with Practical Examples

Summary: Efficient Long-Horizon Reasoning via Context Folding

RELATED ARTICLES

The AI lab revolving door spins ever faster

A Coding Guide to Build a Procedural Memory Agent That Learns,...

Mistral AI Ships Devstral 2 Coding Models And Mistral Vibe CLI...