Why does partial observability require interaction instead of better reasoning?

This explores why, when an agent can't see the whole problem at once (partial observability), the fix is to act in the environment and gather information — not to think harder about what it already has.

This explores a subtle distinction: partial observability means the missing piece isn't a reasoning gap you can close by thinking longer — it's information you simply don't possess yet, and no amount of internal deliberation manufactures data that was never in the context. The cleanest statement of this in the corpus is the finding that test-time *interaction* scaling is a separate axis from chain-of-thought scaling Does agent interaction time scale separately from reasoning depth?. Reasoning depth lets a model squeeze more out of what it has; interaction — taking environment steps, exploring, backtracking, replanning — lets it go get what it doesn't have. The note is explicit that interaction scaling *dominates* precisely on tasks with partial observability, which is the heart of your question.

Why can't better reasoning substitute? Two notes give the structural reason. First, there's a proven floor: longer reasoning chains dampen but never eliminate sensitivity to imperfect input Can longer reasoning chains eliminate model sensitivity to input noise?. If your starting information is incomplete or noisy, more reasoning steps reduce the damage but can't zero it out — the limit is set by the input, not the depth of thought. Second, the social-simulation work shows what happens when models are quietly handed full observability: they look competent only because the setup let them skip the grounding work Why do LLMs fail when simulating agents with private information?. The moment agents hold genuinely private information, performance collapses — revealing that the 'reasoning' was riding on hidden access, not actually inferring the unseen. That's a direct demonstration that you cannot reason your way past information you don't have.

The multi-agent coordination findings sharpen this further: agents fail not from weak reasoning but from acting on the world wrongly — adopting strategies without informing neighbors, or accepting incoming information without verifying it Why do multi-agent systems fail to coordinate at scale?. The bottleneck is the *exchange* of information across a partially-observed network, an interaction problem, not a cognition problem. Interestingly, the corpus also has the inverse design move: when you *do* have full information, you can strip history out and reason memorylessly without losing coherence Can reasoning systems forget history without losing coherence?. That's the tell — memoryless reasoning works only when each state is fully specified. Under partial observability the missing context lives precisely in the history and the environment, so you have to interact to recover it.

The lateral payoff: the same instinct shows up in how systems are architected. LLM Programs deliberately *hide* step-irrelevant context and feed each call only what it needs Can algorithms control LLM reasoning better than LLMs alone?, and decoupling reasoning from tool observations treats the act of fetching an observation as a first-class step separate from thinking about it Can reasoning and tool execution be truly decoupled?. Both designs implicitly accept that information acquisition and reasoning are different operations — which is exactly why, when the world is only partially visible, you reach for the acquisition operation. What you didn't know you wanted to know: 'reason longer' and 'go look' aren't two strengths of the same dial. They're orthogonal axes, and partial observability is the regime where only one of them moves the needle.

Sources 7 notes

Does agent interaction time scale separately from reasoning depth?

Test-time interaction—increasing environment steps—enables exploration, backtracking, and replanning that per-step reasoning cannot achieve. Curriculum-based RL on rollout length produces SOTA web agents, showing interaction scaling dominates on tasks with partial observability.

Can longer reasoning chains eliminate model sensitivity to input noise?

Lipschitz continuity analysis proves that while additional reasoning steps reduce perturbation propagation, a non-zero robustness floor exists structurally. Sensitivity decreases with stronger embedding and hidden state norms but never reaches zero.

Why do LLMs fail when simulating agents with private information?

Research shows LLMs perform well when one model controls all interlocutors but fail systematically when agents possess private information. This reveals that apparent social competence relies on grounding work that models skip in omniscient settings.

Why do multi-agent systems fail to coordinate at scale?

AgentsNet benchmark shows agents fail to coordinate strategies either by agreeing too late or adopting strategies without informing neighbors. Agents accept neighbor information without verification, enabling error propagation while remaining capable of detecting direct conflicts.

Can reasoning systems forget history without losing coherence?

Atom of Thoughts decomposes problems into DAGs and contracts them iteratively, ensuring each state depends only on the current problem—not prior steps. This memoryless approach eliminates historical baggage that bloats reasoning while maintaining answer equivalence.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Can reasoning and tool execution be truly decoupled?

ReWOO and Chain-of-Abstraction both decouple reasoning from tool responses through different mechanisms—planning-before-execution and abstract placeholders respectively—eliminating quadratic prompt growth and sequential latency while maintaining reasoning quality.

Why does partial observability require interaction instead of better reasoning?

Sources 7 notes

Next inquiring lines