SYNTHESIS NOTE
Recommender Systems

Can single sessions alone rival history-rich recommendation?

Can encoder-only transformers with clever masking capture enough sequential signal from a single anonymous session to match recommenders that use extensive user history? This explores whether smart architecture can overcome sparse data.

Synthesis note · 2026-06-03 · sourced from Recommenders Architectures

Session-based recommendation predicts the next item from a single, often anonymous session — no historical user profile to lean on. Sequential Masked Modeling (SMM) adapts encoder-only transformers (BERT/DeBERTa-style) to this regime with two pieces: sliding-window data augmentation (turning one session into many sub-sequences) and a penultimate-token masking strategy that better captures sequential dependencies than standard masking. Across Yoochoose, Diginetica, and Tmall, Transformer-SMM models consistently outperform single-session approaches and rival cross-session/multi-relation methods that have access to more extensive user history — despite using only single-session data.

The keeper is the masking design: where standard masked modeling hides random tokens, masking the penultimate token in augmented sequences directly targets next-item prediction, letting an encoder-only model extract strong sequential signal from minimal context — matching methods that need richer history.

This sits in the vault's recommender thread as a session-modeling architecture note. It shares the sequential structure matters lesson with Does conversation order matter for recommending items in dialogue?, and the do-more-with-less framing rhymes with the inductive-bias-over-capacity results elsewhere in the recommenders cluster.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
13 direct connections · 58 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

encoder-only transformers with penultimate-token masking capture single-session dependencies rivaling history-rich recommenders