Can reasoning stay grounded without external feedback loops?
Explores whether language models can maintain accurate reasoning through their own internal chains of thought, or whether they need real-world feedback to avoid hallucination and error propagation.
Pure chain-of-thought reasoning is a static black box: the model uses its own internal representations to generate each reasoning step, with no external correction mechanism. When an early step hallucinates or drifts, subsequent steps build on the error — error propagation is the structural consequence of having no feedback loop to reality.
ReAct addresses this by interleaving two kinds of operations:
- Reasoning traces: Verbal thoughts that track progress, adjust plans, handle exceptions, and identify when external information is needed
- Actions: Queries to external sources (Wikipedia API, interactive environments) that inject real-world grounding into the reasoning context
The interleaving is tightly coupled: reasoning identifies what information is needed, action retrieves it, reasoning interprets it and updates the plan. This is not reasoning first then acting — it is continuous mutual conditioning where each reasoning step can trigger an action, and each action result reshapes the next reasoning step.
Empirical results: On knowledge-intensive QA (HotpotQA, Fever) where pure CoT hallucinates and propagates errors, ReAct's Wikipedia API interaction allows real-time fact-checking and error correction. On interactive decision making (ALFWorld, WebShop), ReAct outperforms imitation and reinforcement learning methods by 34% and 10% absolute success rate respectively, with only 1-2 in-context examples.
The mechanism: Human "inner speech" plays this role — verbal reasoning supports working memory, tracks state, handles exceptions. ReAct externalizes this to allow fact-grounding of reasoning content, not just structural organization of reasoning steps.
This is the foundational architectural pattern that subsequent designs either extend (ReWOO separating planning from execution) or abstract from (CoA using abstract placeholders instead of waiting for real responses). Understanding what ReAct prevents (error propagation from ungrounded chains) explains why architectural evolution moved toward earlier separation of planning from execution.
Source: Reasoning Architectures
Related concepts in this collection
-
Do language models actually use their reasoning steps?
Chain-of-thought reasoning looks valid on the surface, but does each step genuinely influence the model's final answer, or are the reasoning chains decorative? This matters for trusting AI explanations.
ReAct's external grounding provides a mechanism for causal necessity: steps that retrieve wrong facts produce wrong answers, creating a cleaner causal chain
-
Can reasoning and tool execution run in parallel?
Standard LLM tool use halts for each response, creating redundant prompts and sequential delays. Do alternative architectures that separate reasoning from tool observation actually eliminate these costs?
ReWOO is the architectural evolution beyond ReAct's sequential interleaving
-
When should retrieval happen during model generation?
Explores whether retrieval should occur continuously, at fixed intervals, or only when the model signals uncertainty. Standard RAG retrieves once; long-form generation requires dynamic triggering based on confidence signals.
extends ReAct's insight: retrieval should be uncertainty-gated, not fixed-interval; FLARE as the next generation
-
Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
ReAct's external actions counteract parametric association override by injecting fresh grounding
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
interleaved reasoning and action prevents hallucination by grounding reasoning traces in external world feedback rather than model-internal associations