Conversational AI Systems Psychology and Social Cognition

Can conversations themselves personalize without user profiles?

Can a conversational AI learn about user traits and adapt in real time by rewarding itself for asking insightful questions, rather than relying on pre-collected profiles or historical data?

Note · 2026-02-23 · sourced from Assistants Personalization
Why do AI agents fail to take initiative? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Most LLM personalization requires something before the conversation starts — a user profile, historical interactions, preference embeddings, or calibration queries. The curiosity reward approach inverts this: the conversation itself is the personalization mechanism.

The key idea: augment standard RLHF with an auxiliary reward that measures how much each turn improves the model's belief about the user's latent type. The agent is rewarded for reducing its uncertainty about who it's talking to. This creates an intrinsic drive to ask insightful questions, make context-sensitive probes, and adapt responses based on inferred traits — rather than passively responding to stated preferences.

The architecture separates two reward channels:

  1. End-of-conversation sparse reward — standard RLHF signal for overall conversation quality
  2. Turn-based intrinsic reward — improvement in user type prediction accuracy after each action

This dual signal forces a balance between helpfulness and inquisitiveness. Without the curiosity reward, models default to passive helpfulness (since Why can't conversational AI agents take the initiative?). With it, models learn to strategically gather information about users.

Tested in two domains: education (inferring learning style to adapt teaching) and fitness (inferring lifestyle attributes to personalize exercise recommendations). The simulation used 20 user attributes with 5 decision-relevant ones and 15 background attributes — emulating real-world complexity where most user characteristics are irrelevant noise.

The distinction from prior work is sharp. PReF (reward factorization) requires 10 pre-conversation preference queries. PLUS (text-based summaries) requires historical interaction data. P-RLHF requires user-specific feedback data. The curiosity reward requires nothing — personalization emerges from the conversation dynamics.

This connects to Can AI agents learn when they have something worth saying? — both use intrinsic motivation, but for different purposes. Inner Thoughts drives general social proactivity (10 heuristics from cognitive psychology). Curiosity reward drives personalization-specific proactivity (reducing uncertainty about user type). Together they suggest that intrinsic motivation is a general mechanism for making AI conversationally active, with specific reward signals shaping what the activity targets.

The implication for open-ended dialogue is significant: when there's no clear task, engagement itself becomes the objective. Curiosity-driven agents that encourage users to share naturally may be more enjoyable than those that wait to be asked — and the sharing simultaneously enables better personalization.


Source: Assistants Personalization

Related concepts in this collection

Concept map
18 direct connections · 130 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

curiosity reward enables real-time personalization by rewarding the agent for reducing uncertainty about user type during multi-turn conversation