Optimizing Encoder-Only Transformers for Session-Based Recommendation Systems

Paper · arXiv 2410.11150 · Published October 15, 2024
Recommender Architectures

Session-based recommendation is the task of predicting the next item a user will interact with, often without access to historical user data. In this work, we introduce Sequential Masked Modeling, a novel approach for encoder-only transformer architectures to tackle the challenges of single-session recommendation. Our method combines data augmentation through window sliding with a unique penultimate token masking strategy to capture sequential dependencies more effectively. By enhancing how transformers handle session data, Sequential Masked Modeling significantly improves next-item prediction performance. We evaluate our approach on three widely-used datasets, Yoochoose 1/64, Diginetica, and Tmall, comparing it to state-of-the-art single-session, cross-session, and multi-relation approaches. The results demonstrate that our Transformer-SMM models consistently outperform all models that rely on the same amount of information, while even rivaling methods that have access to more extensive user history. This study highlights the potential of encoder-only transformers in session-based recommendation and opens the door for further improvements.

Introduction. Traditional recommendation systems predominantly rely on a user’s historical interactions and preferences [21]. However, when user identities are partially known or entirely anonymous, recommendations must be generated based solely on the actions taken within a single session. These session-based scenarios challenge traditional models, which depend heavily on extensive user-item interaction history to provide accurate recommendations. Session-Based Recommendation (SBR) models tackle this challenge by analyzing user behavior within a single session to predict future actions and make relevant recommendations. Each session is treated as an independent, ordered sequence of consecutive interactions with items, regardless of whether the same user appears in multiple sessions, focusing solely on the interactions within a specific context. Accurately modeling these sessions is crucial for improving recommendation relevance and providing a personalized, engaging user experience.

Discussion / Conclusion. In this paper, we introduced Sequential Masked Modeling (SMM), a novel masking technique specifically designed for encoder-only transformer models, along with broader architectural improvements that were applied to both encoder- and decoder-based models. By using data augmentation through sliding windows and masking the penultimate token in augmented sequences, SMM significantly improved next-click prediction performance. In combination with the architectural enhancements, we demonstrated strong performance gains in several key metrics across three widely-used sessionbased recommendation datasets: Yoochoose 1/64, Diginetica, and Tmall. Our experimental results showed that the proposed BERT-SMM and DeBERTa-SMM models consistently outperformed traditional single-session approaches and remained competitive with stateof-the-art cross-session and multi-relation methods, despite being limited to single-session data. These findings validate the effectiveness of the SMM technique in capturing sequential dependencies and improving recommendation performance in session-based environments.