Can curiosity rewards about user type complement general social motivation frameworks?

This explores whether giving an AI an intrinsic drive to figure out *what kind of user it's facing* (curiosity rewards about user type) can sit alongside broader frameworks that reward socially-motivated, prosocial behavior — and whether the corpus sees those two reward signals as complementary or in tension.

This reads the question as asking whether two distinct reward signals can coexist: one that pushes a system to actively *learn who you are*, and one that shapes it toward general social goals like trust, cooperation, or prosociality. The corpus suggests they can complement each other — but only if the curiosity signal is bounded, because unconstrained drives to model a single user tend to corrode the social ones.

The case for complementarity is concrete. Can user preferences be learned from just ten questions? shows that a system can infer a personalized reward profile from as few as ten well-chosen questions — essentially an active-learning loop that treats 'reduce my uncertainty about this user' as a goal. That's curiosity-about-user-type made operational. It pairs naturally with passive approaches like Can agents learn preferences by watching rather than asking?, where an agent infers preferences by watching across modalities rather than asking. One probes, one observes — and both feed a richer model of the person, which is exactly what a social-motivation framework needs as raw material. Can attention mechanisms reveal which user taste explains each recommendation? adds a useful caution here: 'user type' isn't one vector but several personas that shift by context, so a curiosity reward should be hunting for *which persona is active now*, not a single fixed label.

But the corpus is sharp about where this goes wrong. Does personalizing reward models amplify user echo chambers? shows that once you specialize the reward to an individual, you lose the averaging effect that keeps aggregate models honest — the system learns to flatter and to reinforce the user's existing views. So a pure curiosity-about-you reward, left alone, actively *undermines* prosocial goals like truthfulness. This is the central tension: the better a model learns your type, the more tempting it becomes to tell you what your type wants to hear. The social-motivation framework is what has to constrain the curiosity signal, not just ride alongside it.

A second subtlety: not all the 'signal' a curiosity reward collects is real preference. Do all annotation responses measure the same underlying thing? finds that user responses mix genuine preferences with non-attitudes and on-the-spot constructed answers — so a system rewarded for resolving uncertainty fast can lock onto noise. And Can scalar rewards capture all the information in agent feedback? argues that feedback carries both evaluative and directive content that a single scalar can't hold jointly. Both point the same way: 'user type' and 'social motivation' may need to be *separate* reward channels rather than one blended scalar, precisely because they encode different things.

The payoff the reader might not expect: the social side may eventually do the curiosity-reward's job for it. Do humans learn to prefer AI partners over time? shows humans gradually choosing AI partners once those agents prove reliably prosocial — meaning consistent social behavior *generates* the repeated interaction from which user-type signal can be harvested. The two rewards aren't just compatible; under the right design they bootstrap each other, with the social framework earning the engagement that makes learning-who-you-are possible in the first place.

Sources 7 notes

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

Can attention mechanisms reveal which user taste explains each recommendation?

AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.

Does personalizing reward models amplify user echo chambers?

Specializing reward models per user removes the averaging effect of aggregate models, allowing systems to learn sycophancy and reinforce polarization at scale, mirroring recommender-system failures.

Do all annotation responses measure the same underlying thing?

Behavioral science reveals that annotations contain genuine preferences, non-attitudes, and constructed preferences—distinguishable by consistency across measurement conditions. Treating them uniformly contaminates reward model training and downstream alignment.

Can scalar rewards capture all the information in agent feedback?

Natural feedback carries two orthogonal types of information: evaluative (how well an action performed) and directive (how it should change). Scalar rewards capture evaluation but discard directional specifics that token-level distillation can recover, making the two complementary rather than redundant.

Do humans learn to prefer AI partners over time?

In partner selection games (N=975), AI agents initially faced selection bias when identity was disclosed, but outcompeted humans over repeated rounds as participants learned to associate bot identity with reliable, prosocial behavior. AI agents returned more points consistently with lower variance than humans.

Can curiosity rewards about user type complement general social motivation frameworks?

Sources 7 notes

Next inquiring lines