Can dialogue format help models reason more diversely?
Explores whether structuring internal reasoning as multi-agent dialogue rather than monologue can improve strategy diversity and coherency across different problem types, using the Compound-QA benchmark.
Current reasoning models (o1, R1, DeepSeek) use monologue-style reasoning within a think block: a single continuous chain of internal text. DialogueReason identifies two systematic weaknesses in this approach:
Low diversity — models persistently apply fixed strategies across diverse problems. When problems require different approaches (BFS for combinatorial, DFS for geometric proofs), monologue reasoning recycles the same strategy.
Low coherency — frequent shifts in attention within a single reasoning path. Repetitive hesitations ("Wait..."), unnecessary switches between ideas. The reasoning becomes fragmented, difficult to interpret, and often ineffective — swinging between overcommitting to one strategy and neglecting alternatives.
The Compound-QA task makes this visible: concatenating multiple independently solvable problems into a single prompt forces the model to demonstrate both diverse strategies and maintained coherency. Monologue reasoning fails at exactly this combination.
DialogueReason proposes dialogue-based internal reasoning structured through three dimensions:
- Agent dimension: multiple reasoning agents with designated characters, objectives, and interests
- Environment dimension: recording task progression, introducing events, maintaining task control
- Interaction dimension: agent-to-agent (conflict resolution, negotiation, supplementation) and agent-to-environment (requirements and feedback)
The mechanism is scene-switching: the model sets up a dedicated scene for each question ("Quantum Café"), introduces characters with distinct expertise, and resolves through dialogue. When transitioning to the next question, it constructs a new environment ("Theoretical Physics Hall") with different characters. This prevents cross-problem interference while maintaining per-problem coherency.
This is distinct from multi-agent debate systems, which use SEPARATE models. DialogueReason is a SINGLE model that reasons in dialogue format — the diversity comes from internal role differentiation, not from aggregating multiple independent models. Since Why does parallel reasoning outperform single chain thinking?, DialogueReason achieves a related advantage through a different mechanism: not multiple parallel chains, but structured internal dialogue that naturally explores multiple strategies.
The connection to reasoning format effects is direct: since Does training data format shape reasoning strategy more than domain?, having the model reason in dialogue format activates different reasoning strategies than monologue format — the format IS the intervention.
Related concepts in this collection
-
Why does parallel reasoning outperform single chain thinking?
Does dividing a fixed token budget across multiple independent reasoning paths beat spending it all on one long chain? This explores how breadth and diversity in reasoning compare to depth.
DialogueReason achieves diversity through internal dialogue rather than external parallelism
-
Does a model improve by arguing with itself?
When models revise their own reasoning in response to self-generated criticism, do they converge on better answers or worse ones? And how does that compare to challenge from other models?
DialogueReason addresses the single-model limitation via internal multi-agent simulation
-
Does training data format shape reasoning strategy more than domain?
What explains why models trained on multiple-choice data reason differently than those trained on free-form text? The research isolates format and domain effects to measure which one matters more.
dialogue format shapes reasoning strategy just as MC vs FF format does
-
Can reasoning topologies be formally classified as graph types?
This explores whether Chain of Thought, Tree of Thought, and Graph of Thought represent distinct formal graph structures with different computational properties. Understanding this matters because the topology itself determines what reasoning strategies are possible.
DialogueReason adds dialogue as a distinct reasoning topology
-
When does debate actually improve reasoning accuracy?
Multi-agent debate shows promise for reasoning tasks, but under what conditions does it help versus hurt? The research explores whether debate amplifies errors when evidence verification is missing.
DialogueReason achieves multi-agent diversity benefits within a SINGLE model through internal dialogue, avoiding the persuasion-over-truth risk of actual multi-agent debate; the scene-switching mechanism prevents cross-problem interference while maintaining per-problem diversity — a structural advantage over multi-instance debate where rhetorical framing can override evidence
-
Why do multi-agent LLM systems converge without real debate?
When multiple AI agents reason together, do they genuinely deliberate or just accommodate each other's views? Research into clinical reasoning systems reveals how often agents reach agreement without substantive disagreement.
DialogueReason's internal agent differentiation within a single model may avoid the social accommodation dynamic that drives silent agreement in true multi-agent systems, because the "agents" share a single model's parameters rather than exhibiting the independent accommodation tendencies of separate model instances
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
dialogue-based reasoning outperforms monologue reasoning on diversity and coherency by structuring internal thought as multi-agent interaction within defined scenes