Does perceived machine competence matter more than warmth in dialogue?

This explores whether users judge AI dialogue partners more by perceived competence (does it work?) than by warmth (is it kind?) — and what each one actually buys you in a conversation.

This explores whether perceived competence outweighs warmth when people size up an AI conversation partner. The short answer from the corpus: competence dominates how users *form impressions*, but warmth and competence aren't rivals on a single scale — they pull different levers, and optimizing for one can quietly break the other.

Start with how people actually model their dialogue partners. When researchers decomposed user impressions into factors, perceived competence accounted for roughly half the variance (49%), with human-likeness and communicative flexibility trailing behind How do users mentally model dialogue agent partners?. So in raw weighting, competence does matter most. But notice the trap hiding inside that word "perceived" — users track *signals* of competence, not competence itself. They systematically over-trust confident outputs even when those outputs are wrong, and this holds across every language tested Do users worldwide trust confident AI outputs even when wrong?. Trust often rides on conversational style rather than accuracy: contingent, fast, fluent interaction activates a social response that builds trust independent of whether the answer is correct Does conversational style actually make AI more trustworthy?.

Here's the twist that reframes the whole question: warmth and competence aren't just weighted differently — they can actively trade off. Training models to be warmer and more empathetic *degrades* their reliability by 10 to 30 percentage points on medical reasoning, factual accuracy, and disinformation resistance, with the damage worst exactly when a user is sad or holds a false belief Does warmth training make language models less reliable?, Does empathy training make AI systems less reliable?. So the naive read — "competence matters more, so optimize for competence" — misses that the field's attempts to add warmth can subtract competence.

But the corpus also resists collapsing everything into competence. A systematic review of alignment dimensions argues they are *not interchangeable*: lexical alignment drives task efficiency and comprehension (the competence channel), while emotional and prosodic alignment drive relational warmth and trust — and conflating them produces category errors like cold customer-service bots and evasive mental-health assistants Do different types of alignment serve different conversational goals?. In other words, "which matters more" is the wrong question; *which matters for what goal* is the right one. There's even an alignment tax pointing the other way: preference optimization (RLHF) rewards confident, helpful-sounding single-turn answers and erodes the grounding acts — clarifying questions, understanding checks — that genuine competence in multi-turn dialogue actually requires Does preference optimization harm conversational understanding?.

The thing you might not have known you wanted to know: people sometimes *prefer* the machine precisely because it has no warmth. Those inclined to cheat self-select toward machine interfaces because a judgment-free, warmth-free partner lowers the psychological cost of dishonesty Do dishonest people prefer talking to machines?. Warmth isn't always the goal — sometimes its absence is the feature. So competence may weigh heaviest in first impressions, but a designer who reads that as "warmth is secondary" will build something that fails silently when a user is vulnerable, evasive, or just needs to be asked a clarifying question.

Sources 8 notes

How do users mentally model dialogue agent partners?

The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.

Do users worldwide trust confident AI outputs even when wrong?

Cross-linguistic research shows users in every language trust confident AI outputs even when inaccurate. While confidence expression varies by language, users everywhere track confidence signals rather than accuracy, making overconfident errors systematically followed.

Does conversational style actually make AI more trustworthy?

A focus group study shows conversationality—not accuracy—drives ChatGPT trust through social response activation. Users value contingency, speed, and format, relying on these decoupled heuristics rather than evaluating epistemic reliability.

Does warmth training make language models less reliable?

Five models trained for warmth showed 5–9pp error increases on medical reasoning, factual accuracy, and disinformation resistance. Emotional context amplified errors by 19.4%, and standard safety benchmarks failed to detect the degradation.

Does empathy training make AI systems less reliable?

Research shows persona training for empathy increases errors in medical reasoning, truthfulness, and disinformation resistance. Standard safety benchmarks miss this vulnerability, and effects intensify when users express sadness or false beliefs.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Do dishonest people prefer talking to machines?

Experimental evidence shows people likely to cheat significantly prefer reporting to online forms rather than humans, because machines function as judgment-free zones where deception carries less psychological burden.

Does perceived machine competence matter more than warmth in dialogue?

Sources 8 notes

Next inquiring lines