Recommender Systems

How can evaluation metrics reflect graded relevance and user attention?

Traditional IR metrics treat relevance as binary, but real user needs involve degrees of relevance and attention patterns. Can evaluation methods capture both graded relevance judgments and the reality that users examine fewer documents further down ranked lists?

Note · 2026-05-03 · sourced from Recommenders General
What breaks when specialized AI models reach real users?

Traditional IR evaluation — precision and recall — assumes binary relevance: a document is either relevant or not. Real user information needs are not binary. A document might be highly relevant, marginally relevant, or completely irrelevant, and evaluation should credit systems for surfacing highly-relevant documents earlier in the ranking than marginally-relevant ones.

Jarvelin and Kekalainen's three measures handle this. Cumulative Gain (CG) accumulates relevance scores down the ranked list — a system gets credit for relevant documents anywhere in the ranking. Discounted Cumulative Gain (DCG) applies a position discount that devalues late-retrieved documents, reflecting that users examine fewer documents at lower ranks. Normalized DCG (nDCG) computes DCG as a fraction of the ideal DCG (the DCG of the perfect ranking), giving a 0-to-1 score that is comparable across queries and systems.

The conceptual contribution is binding evaluation to user behavior. Modern users overwhelmed by retrieval results don't examine all of them — they examine top results more than later ones. Evaluation that doesn't reflect this incentivizes systems to put any relevant documents into the result set, regardless of position. DCG-style evaluation incentivizes systems to put highly relevant documents at the top.

This metric is now standard not just in IR but in recommendation, where the same logic applies: a recommendation list is examined top-down, and the perfect recommendation is the one at position 1. nDCG@k is the dominant evaluation metric for top-K recommendation precisely because it encodes the user-attention pattern that makes ranking matter.


Source: Recommenders General

Related concepts in this collection

Concept map
14 direct connections · 111 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

discounted cumulative gain extends IR evaluation to graded relevance — late-retrieved relevant documents discount because users examine fewer