INQUIRING LINE

How does active learning reduce queries needed for user preference inference?

This explores how systems pick which questions to ask so they can learn what you like in as few queries as possible — choosing maximally informative questions rather than asking everything.


This explores active learning as a query-efficiency trick: instead of asking a user hundreds of questions to map their taste, the system chooses each question to eliminate the most uncertainty about what they want. The cleanest example in the corpus is PReF, which first learns a small set of base reward functions from preference data, then treats any individual user as a linear combination of those bases. Because the heavy lifting (the base functions) is already done, personalizing a new user collapses to estimating a few coefficients — and active learning picks the questions that most sharply reduce uncertainty in those coefficients. The striking result: roughly ten adaptive questions are enough, and it happens at inference time without retraining the model's weights Can user preferences be learned from just ten questions?.

The deeper mechanism is uncertainty targeting, and the corpus shows the same idea surfacing under different vocabulary in recommendation. Epistemic neural networks separate two kinds of uncertainty — the irreducible noise in user behavior (aleatoric) from genuine ignorance about model parameters (epistemic) — and spend exploration budget only on the second. That's the same logic as active learning's question selection: don't waste a query resolving noise you can't reduce, spend it where new information actually changes your belief. The payoff is concrete: 29% fewer interactions than baselines while improving click-through Can neural networks explore efficiently at recommendation scale?. Active learning and Thompson sampling are two faces of the same coin — both ask 'what do I most need to learn next?'

A second route to fewer queries is structural: if you represent the user the right way, each answer tells you more. PReF's linear-combination-of-bases is one such representation. Another is modeling a user as several weighted personas rather than one averaged taste vector, so a single piece of feedback can be attributed to a specific persona rather than smeared across everything Can attention mechanisms reveal which user taste explains each recommendation?. And there's evidence that abstract preference summaries beat hoarding every past interaction — semantic memory outperforms episodic recall — which means you need fewer raw observations if you compress them into the right abstractions Does abstract preference knowledge outperform specific interaction recall?.

The most provocative cross-domain framing is that the cheapest query is the one you never ask. Some systems infer preferences by watching rather than interrogating: entity-centric memory graphs let an agent build up who-likes-what from continuous multimodal observation, no questions required Can agents learn preferences by watching rather than asking?. And conversational recommenders that fold 'what to ask, what to recommend, and when' into a single learned policy — instead of three separate decisions — optimize the whole conversation trajectory, so they stop asking once they've learned enough rather than marching through a fixed script crs-unified-policy-learning-replaces-three-separate-decisions-what-to-ask-what-to.

Put together, the corpus suggests query efficiency isn't one technique but a family: target your uncertainty (PReF, epistemic networks), represent the user so each answer counts more (personas, semantic abstraction), and where possible learn by observation or holistic policy instead of asking at all. The thing you didn't know you wanted to know: the same 'reduce my uncertainty about your coefficients' math that lets ten questions personalize a reward model is what lets a bandit explore a billion-item catalog with a third fewer interactions.


Sources 6 notes

Can user preferences be learned from just ten questions?

PReF learns base reward functions from preference data, then uses active learning to select maximally informative questions that reduce coefficient uncertainty. Users can be personalized via inference-time reward alignment without weight modification.

Can neural networks explore efficiently at recommendation scale?

ENR separates aleatoric from epistemic uncertainty, focusing computation only on parameter uncertainty needed for Thompson sampling. It improved click-through rates 9% and ratings 6% while requiring 29% fewer interactions than baselines.

Can attention mechanisms reveal which user taste explains each recommendation?

AMP-CF represents each user as multiple latent personas weighted dynamically by candidate item. This makes recommendations both diverse and interpretable—each suggestion traces to the specific persona preference it satisfies—without requiring post-hoc reranking.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can agents learn preferences by watching rather than asking?

M3-Agent demonstrates that separating episodic events from semantic knowledge in an entity-centric graph, combined with parallel memorization and control processes, allows agents to infer and act on user preferences without asking. This architecture mirrors human cognitive systems that bind disparate information about individuals across sensory modalities.

Next inquiring lines