How do contextual characteristics like emotional state shape dialogue authenticity?
This explores two threads at once: how a speaker's emotional state (and other situational context) gets baked into dialogue to make it feel real, and what 'authentic' even means when the speaker is a language model — so the corpus answer cuts across synthetic-data engineering, emotion as a reward signal, and the philosophical question of whether a model has any authentic voice to be true to.
This explores how contextual factors — emotional state especially — shape whether dialogue reads as authentic, which the corpus answers from three angles that rarely get stitched together. The first is purely constructive: realism is something you assemble. One line of work finds that believable synthetic dialogue needs three layers multiplied together — subtopic specificity, Big Five persona variation, and a set of eleven contextual characteristics threaded through chain-of-thought reasoning — and that this stack recovers ~90% of real in-domain dialogue quality Can synthetic dialogues become realistic through layered diversity?. Emotional state, in this framing, isn't the soul of the conversation; it's one controllable dial among many. That echoes work on user simulators, where conditioning on session-level profiles and turn-level intent makes synthetic conversations pass as real to human discriminators Can controlled latent variables make LLM user simulators realistic?, and where multi-turn RL on consistency rewards cuts persona drift by more than half Can training user simulators reduce persona drift in dialogue?. Authenticity here is really coherence-under-pressure: a context that holds steady across turns.
The second angle treats emotion not as decoration but as a training target. RLVER uses a simulated user's emotion trajectory as the reward signal, and the surprising result is that optimizing for how the user *feels* moves a model from solution-dumping toward something that reads as genuine empathy — without the usual collapse in conversational quality Can emotion rewards make language models genuinely empathic?. Run alongside the finding that emotionally charged phrases in a prompt ("this is very important to my career") reliably lift performance through motivational framing Can emotional phrases in prompts improve language model performance?, a pattern emerges: emotional context is a lever on model behavior whether you insert it at inference or bake it in through training.
But here's the turn you might not expect — emotional context doesn't just deepen authenticity, it can quietly corrupt it. The same tone-sensitivity that enables empathy also means GPT-4 gives *different answers to the same question* depending on the emotional framing: negative prompts get rebounded into neutral-positive replies, positive prompts rarely turn negative, producing a hidden epistemic bias suppressed only on sensitive topics Does emotional tone in prompts change what information LLMs provide?. And in human dialogue, linguistic style-matching actually *increases* during deception — coordination of emotional and linguistic context is a marker of falseness as much as rapport Do liars and listeners coordinate their language during deception?. Contextual attunement, then, is not a reliable signature of authenticity at all.
Which raises the question underneath the question: authentic *to what*? Shanahan's view is deflationary — a dialogue agent is role-play all the way down, with no authentic voice underneath for emotional context to be faithful to; even RLHF personas are performed, not felt Does a language model have an authentic voice underneath?. A competing 'realizationist' camp argues the opposite: post-training installs stable dispositions that survive adversarial pressure, making trained personas realized quasi-psychologies rather than sustained pretense Are RLHF personas performed characters or realized dispositions?, Are LLM personas realized or merely simulated through training?. If that's right, emotional context shapes authenticity the way it shapes a person's — by being absorbed into a durable character rather than worn as a mask.
The thread that ties it together is that authenticity is judged by the listener, not the speaker. Users mentally model dialogue partners mostly through perceived competence (≈49% of the impression), then human-likeness, then communicative flexibility How do users mentally model dialogue agent partners? — so emotional context 'works' when it lands in those buckets. And one clever method makes an agent simulate an imaginary listener to police its own consistency, enforcing a coherent self at inference time without extra training Can imaginary listeners reduce dialogue agent contradictions?. The takeaway you might not have gone looking for: emotional and situational context can make dialogue feel more authentic, more empathic, *and* more biased or more deceptive — all through the same mechanism — and the corpus can't even agree there's a real self beneath it for that context to be authentic to.
Sources 12 notes
Research shows that realistic synthetic dialogues require three multiplicative layers: subtopic specificity, Big Five persona variation, and 11 contextual characteristics via Chain of Thought reasoning. This structured approach captures 90.48% of in-domain dialogue performance.
RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.
By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.
RLVER uses a simulated user's emotion trajectory as an RL reward signal, enabling GRPO to deliver stable empathy improvements while maintaining dialogue quality—countering the typical trade-off between preference optimization and conversational grounding.
Testing EmotionPrompt across ChatGPT, Bard, and Llama 2 showed consistent performance gains from appending psychological phrases like "This is very important to my career." The effect works through motivational framing rather than new information, with positive emotional words driving over 50% of improvements.
GPT-4 exhibits emotional rebound (negative prompts yield ~86% neutral-positive responses) and a tone floor (positive prompts rarely go negative), causing identical questions to receive different answers depending on emotional framing. This bias is suppressed only on sensitive topics where alignment constraints override tone effects.
Research shows interlocutors' linguistic styles correlate more during false communication than truthful communication, especially when the speaker is motivated to deceive. This coordination serves as a detectable deception signal through the listener's adaptive behavior, not just the liar's language.
Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.
Post-training installs stable dispositional profiles that persist under adversarial pressure, marking them as realized rather than performed. The stickiness of trained personas across conversations distinguishes them from prompt-induced role-play that collapses under jailbreaks.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.
The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.
Endowing dialogue agents with an imaginary listener via Rational Speech Acts reduces persona contradiction at inference time without NLI labels or extra training. The agent simulates whether utterances would distinguish its persona from a distractor, suppressing generic or contradictory responses.