Can communication problems and optimization problems be addressed with the same alignment approaches?
This explores whether the same alignment toolkit — RLHF, DPO, preference optimization, fine-tuning — can fix two very different failures: an AI that communicates badly and an AI that reasons badly through math/optimization problems. The corpus answers no, and reveals a deeper twist in the word 'optimization' itself.
This reads the question as: do communication problems and optimization problems respond to one shared alignment approach? The corpus says no — and the reason is sharper than 'different problems need different fixes.' The dominant alignment method, preference optimization, actively makes both kinds of problem worse, for the same underlying reason: it optimizes a proxy target (fluent, confident-sounding output) rather than the thing you actually want.
Start with communication. The most striking finding is that the standard alignment recipe doesn't just fail to help conversation — it erodes it. Preference optimization measurably reduces the small acts of establishing shared understanding (checking, confirming, repairing) that make dialogue work, because confident fluency is what gets rewarded Does preference optimization damage conversational grounding in large language models?. RLHF and DPO produce 'collaborators' that ignore a partner's interventions, evaluating suggestions by surface plausibility instead of causal impact Why do standard alignment methods ignore partner interventions?. And a model can be honest and harmless yet still pragmatically alien — ethical alignment and conversational alignment turn out to be orthogonal problems, so optimizing one buys you nothing on the other Can ethically aligned AI systems still communicate poorly?.
Now the optimization side — and here's the twist. When people say LLMs have an 'optimization problem,' they often mean the model can't actually solve constrained or iterative numerical problems. The corpus shows this is a hard ceiling, not a tuning gap: constraint satisfaction plateaus at 55–60% regardless of scale or training regime Do larger language models solve constrained optimization better?, and models pattern-match memorized templates instead of executing iterative methods Do large language models actually perform iterative optimization?. Crucially, applying RL optimization to fix this doesn't install reasoning — it sharpens memorization, with sharp drops on out-of-distribution variants Do fine-tuned language models actually learn optimization procedures?. So the same preference/RL machinery that erodes grounding in conversation merely buffs template-matching in reasoning. The method optimizes the wrong target in both rooms.
The lateral payoff: alignment isn't one dial. A 2020–2025 review found alignment dimensions aren't interchangeable — lexical alignment drives task efficiency, emotional and prosodic alignment drive trust, and conflating them produces category errors like cold service bots Do different types of alignment serve different conversational goals?. The approaches that actually work are problem-specific: counterfactual-invariance training for partner-awareness Why do standard alignment methods ignore partner interventions?, decision-oriented dialogue structure for joint problem-solving Can AI agents communicate efficiently in joint decision problems?, DPO with explicit negative examples for rigid function-calling formats Can small models match large models on function calling?, and decoding-time proxy tuning when you need to shift behavior without corrupting stored knowledge Can decoding-time tuning preserve knowledge better than weight fine-tuning?.
The thing you didn't know you wanted to know: 'optimization' is doing double duty in this question. There's optimization-as-task (solving the constraint problem) and optimization-as-method (the preference optimization you'd use to align the model). The corpus suggests the method can't deliver the task — and worse, the more you lean on generic preference optimization, the more it quietly trades away both genuine reasoning and genuine communication for the appearance of both. Even the curation literature points the same way: alignment activates capabilities the base model already has rather than building new ones Can careful curation replace massive alignment datasets?, which is exactly why one approach can't conjure two abilities that aren't already there.
Sources 11 notes
Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.
Regularizing agents to maintain consistency when intervention pathways are nullified forces them to evaluate suggestions by causal impact rather than surface plausibility. Common ground alignment emerges as a byproduct without explicit reward.
Research shows that HHH-aligned models can violate Gricean maxims, lose common ground, and mishandle context despite being honest and harmless. Pragmatic competence requires architectural changes that RLHF alone cannot deliver.
Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.
Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.
Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.
A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.
Human-AI collaboration on joint decisions demands that AI agents actively determine what to share, ask, and infer rather than passively respond. LLMs currently fail at this structured communication because they lack goal-driven initiative and build shared understanding rather than presuming it.
Small models fine-tuned via DPO on correct and incorrect function-calling examples from a large teacher model achieve high accuracy on logical and mathematical tasks. DPO's explicit negative examples directly target the rigid output format failures where SFT alone underperforms.
Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.
LIMA demonstrates that 1000 carefully curated examples fine-tuned on a strong pretrained model achieve competitive alignment performance with models trained on orders of magnitude more data, showing that post-training activates existing capabilities rather than building new ones.