Can algorithms plus limited LLM calls solve complex tasks better?
Explores whether decomposing tasks into step-specific prompts within algorithmic control flow—rather than asking the LLM to manage full state—overcomes context window and reasoning limits while improving task performance.
LLM Programs embed an LLM within an algorithm rather than asking the LLM to be the algorithm. The critical design choice: instead of the LLM maintaining the current state of the program (its context), the LLM is presented with only step-specific prompt and context for each step. A classic computer program (Python) handles control flow, parsing of outputs, and augmentation of prompts for succeeding steps.
This is distinct from both Chain-of-Thought (where the LLM manages state through its token stream) and agentic frameworks (where the LLM decides what to do next). In LLM Programs, the algorithm structure is external and explicit, not learned or generated:
- LLM handles: isolated subproblems where its pattern-matching and generation capabilities excel
- Program handles: control flow, state management, output parsing, context filtering
The key benefit is information hiding. By concealing information irrelevant to the current step, each LLM call focuses on an isolated subproblem whose results feed future calls. This addresses two fundamental limitations:
- Capability limits: Complex tasks that are currently too difficult because they require coordinating multiple reasoning steps
- Architectural constraints: The finite context window restricts processing to what fits within it
The approach recognizes the LLM as a limited general agent and avoids further training. Instead, the expected behavior is recursively deconstructed into simpler steps the LLM can perform to a sufficient degree.
This connects to Can modular cognitive tools boost LLM reasoning without training? — both decompose reasoning into modular operations. But LLM Programs are more structured: the control flow is predetermined by the algorithm, whereas cognitive tools are flexibly invoked. It also extends Does separating planning from execution improve reasoning accuracy? — the program IS the decomposer, and each LLM call IS the solver, with clean separation enforced by architecture rather than training.
Decomposed Prompting as the software library formalization: Decomposed Prompting (Khot et al., 2022) makes the software library analogy explicit. The decomposer defines a top-level program using interfaces to simpler sub-task functions. Sub-task handlers serve as "modular, debuggable, and upgradable implementations" — if a particular handler underperforms, it can be debugged in isolation, replaced with an alternative prompt or even a symbolic system (e.g., Elasticsearch), and plugged back in. This is more general than least-to-most prompting: it supports recursive decomposition, non-linear structures, and mixed neural-symbolic pipelines. The key architectural insight is that sub-task handlers are shared across tasks, creating a reusable prompt library — the closest existing analog to how software engineers build with functions. Source: Prompts Prompting.
Source: Novel Architectures
Related concepts in this collection
-
Can modular cognitive tools boost LLM reasoning without training?
Does structuring reasoning as discrete, sandboxed tool calls elicit stronger problem-solving in language models compared to monolithic prompting approaches, and can this approach match specialized reasoning models?
LLM Programs are the more structured variant: algorithm determines when and how tools are called
-
Does separating planning from execution improve reasoning accuracy?
Explores whether modularizing decomposition and solution into separate models prevents interference and boosts performance compared to monolithic approaches.
LLM Programs enforce this separation architecturally: program = decomposer, LLM = solver
-
Can reasoning and tool execution run in parallel?
Standard LLM tool use halts for each response, creating redundant prompts and sequential delays. Do alternative architectures that separate reasoning from tool observation actually eliminate these costs?
LLM Programs achieve this by design: each step gets only relevant context
-
Can extreme task decomposition enable reliable execution at million-step scale?
Can breaking tasks into maximally atomic subtasks with voting-based error correction solve the fundamental reliability problem in long-horizon tasks? This challenges whether better models or better decomposition is the path to high-reliability AI systems.
MAKER takes the LLM Programs principle to its extreme: maximal decomposition with error correction
-
Can we automatically optimize both prompts and agent coordination?
This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
LLM Programs are computational graphs with fixed topology; the optimizable graphs framework generalizes this by allowing edge optimization to discover the program structure rather than predefining it
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLM programs decompose complex tasks into step-specific prompts within algorithmic control flow — hiding irrelevant context per step