Agentic Systems and Planning Reasoning and Knowledge

Can structured templates make code reasoning more reliable than free-form thinking?

Unstructured chain-of-thought reasoning lets models skip cases and make unsupported claims. This explores whether semi-formal templates requiring explicit premises, evidence traces, and alternative checks can prevent these failure modes.

Note · 2026-05-18 · sourced from Tool Computer Use

Unstructured chain-of-thought lets the model reason freely. It also lets the model reason badly — skip cases, make unsupported claims, guess based on function names, conclude from incomplete analysis. Agentic Code Reasoning introduces a structured alternative for code-reasoning tasks: semi-formal reasoning, where agents fill in templates that require explicit evidence for each claim.

The templates act as certificates. The agent must state premises (what is assumed), trace relevant code paths (which functions are examined, where they are defined), provide evidence for semantic properties (not "this returns X" but "this returns X because line N does Y"), and check alternative hypotheses (could this behave differently than I'm assuming?). The structure prevents the model from concluding without showing its work.

The motivating example illustrates the difference. On a real Django patch-equivalence task (django-13670), standard reasoning incorrectly concluded that two patches were equivalent — the model assumed format() was Python's builtin. Semi-formal analysis required the agent to trace format to its definition, where it found that format is shadowed by a module-level function in Django's dateformat.py that expects a datetime object, not an integer. Patch 1 raises an AttributeError; Patch 2 succeeds. The patches are not equivalent. Free-form reasoning missed the shadowing. Template-required tracing caught it.

The empirical results: accuracy on patch equivalence improves from 78% to 88% on curated examples and reaches 93% on real-world agent-generated patches. Similar improvements on fault localization and code question-answering. The templates do not just polish reasoning — they prevent specific failure modes (assumption from function names, single-case analysis where multiple cases exist).

The deeper architectural move is that completeness scaffolding can substitute for execution. Code reasoning without code execution has historically been unreliable. Structured templates make it reliable enough to function as RL reward signal — which opens execution-free reward design as a new direction for code-agent training.

Related concepts in this collection

Concept map
14 direct connections · 114 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

semi-formal reasoning templates act as completeness certificates — force agents to state premises trace paths and derive conclusions explicitly