LLM Reasoning and Architecture Language Understanding and Pragmatics Reinforcement Learning for LLMs

Can generative and discriminative models reach agreement?

Generative and discriminative decoding often produce conflicting answers. Can a game-theoretic framework force these two complementary procedures to reconcile their predictions into a single, more reliable output?

Note · 2026-02-22 · sourced from Question Answer Search
How should we allocate compute budget at inference time? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Language models offer two fundamentally different ways to answer questions. Generatively: sample the most probable answer. Discriminatively: score candidate answers and pick the best. These two procedures often disagree — generative decoding fails when probability mass spreads across contradicting answers; discriminative decoding fails due to miscalibration or sensitivity to question wording. Both are noisy, and their noise is not correlated.

The Consensus Game formalizes this as a regularized imperfect-information sequential signaling game. A Generator agent must communicate an abstract correct/incorrect value to a Discriminator agent, but can only do so using natural language strings from a candidate set. An effective joint policy is one where both agents agree on which strings map to "correct." The resulting decoding algorithm — Equilibrium-Ranking — finds approximate equilibria of this game.

The results are striking: LLaMA-7B with Equilibrium-Ranking outperforms LLaMA-65B and PaLM-540B on multiple benchmarks spanning reading comprehension, commonsense reasoning, mathematical problem-solving, and dialogue. A 7B model matching a 540B model is a ~77x parameter efficiency gain.

The insight is that generative and discriminative procedures contain complementary information. Neither alone captures the model's "best guess at the truth." The game-theoretic framework extracts a consensus signal that is more reliable than either procedure individually — analogous to how ensemble methods combine weak learners, but operating within a single model's two modes of operation.

This is a training-free method — no fine-tuning required. The computational overhead comes from finding the equilibrium at inference time, making it a form of test-time compute scaling. Since Can inference compute replace scaling up model size?, Equilibrium-Ranking provides a concrete mechanism: the test-time compute goes into reconciling the model's own internal disagreements rather than generating longer reasoning chains.

The connection to multi-agent debate is suggestive. Since Why do multi-agent LLM systems converge without real debate?, the Consensus Game forces genuine deliberation between two perspectives (generative and discriminative) within a single model — the equilibrium constraint prevents premature convergence because both agents must independently arrive at consistent signals. And since When does debate actually improve reasoning accuracy?, the Consensus Game sidesteps the evidence-verification problem that plagues inter-model debate: both "agents" operate within the same model's knowledge, so there is no risk of one agent persuading the other with rhetorically superior but factually wrong arguments -- the equilibrium constraint forces agreement on what the model actually knows rather than what it can argue most convincingly.


Source: Question Answer Search

Related concepts in this collection

Concept map
15 direct connections · 151 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

game-theoretic equilibrium between generative and discriminative LM decoding reconciles their inconsistent predictions — small models with consensus match models 100x larger