Can code become the operational substrate for agent reasoning?
Explores whether code, beyond being an LLM output, functions as the primary medium through which agents reason, act, observe, and verify progress in complex tasks.
Most discussion of LLMs and code treats code as a product: the model writes a function, solves a competition problem, or patches a repository, and the code is the deliverable. The "code as agent harness" framing inverts this. In agentic systems, code is increasingly the operational substrate rather than the output — the medium through which an agent reasons (program-aided reasoning externalizes intermediate computation into executable form), acts (robotic and embodied agents run generated programs as policies), models its environment (codebases, execution traces, and tests represent state and dynamics), and verifies (runtime feedback confirms or refutes progress). What makes code uniquely suited to this role is that it is simultaneously executable, inspectable, and stateful: it can be run, read, and carried forward across steps.
This reframing connects threads that otherwise look separate — tool use, planning, memory, and verification all become facets of a single code-centered execution loop. The counterpoint is that not all agent reasoning reduces to code; natural-language deliberation and learned policies do real work that no program captures, and forcing everything into code can be a leaky abstraction. But where verification matters, code's executability gives agents a ground truth that prose lacks. This matters because it offers a unified lens for agent infrastructure: design the code substrate well and reasoning, action, and verification improve together.
— "Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems", https://arxiv.org/abs/2605.18747
Related concepts in this collection
-
Should LLMs handle abstraction only in optimization?
What if LLMs worked exclusively on translating problems to formal constraints, while deterministic solvers handled the numeric work? Explores whether this division of labor could overcome LLM failures in iterative computation.
both treat emitting executable code as the locus of reliable reasoning rather than as a final answer
-
Can structured reasoning replace code execution for RL rewards?
Can semi-formal templates enable execution-free code verification reliable enough to train RL agents without running code? This matters because execution is expensive and slow in agent training loops.
explores the inspectable side of code as a reasoning medium even without execution
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
code is not only llm output but an executable inspectable stateful medium through which agents reason act and verify