Generating Query-Relevant Document Summaries via Reinforcement Learning

Paper · arXiv 2508.08404 · Published August 11, 2025

E-commerce search engines often rely solely on product titles as input for ranking models with latency constraints. However, this approach can result in suboptimal relevance predictions, as product titles often lack sufficient detail to capture query intent. While product descriptions provide richer information, their verbosity and length make them unsuitable for real-time ranking, particularly for computationally expensive architectures like cross-encoder ranking models. To address this challenge, we propose ReLSum, a novel reinforcement learning framework designed to generate concise, query-relevant summaries of product descriptions optimized for search relevance. ReLSum leverages relevance scores as rewards to align the objectives of summarization and ranking, effectively overcoming limitations of prior methods, such as misaligned learning targets. The framework employs a trainable large language model (LLM) to produce summaries, which are then used as input for a cross-encoder ranking model. Experimental results demonstrate significant improvements in offline metrics, including recall and NDCG, as well as online user engagement metrics.

Machine-generated summaries offer a practical solution to this problem. By summarizing the product description into concise, query-relevant attributes such as “Taurine, non- GMO, chicken bone broth,” we can retain the essential information needed for relevance prediction while minimizing the input token length. This enables cross-encoder rankers to achieve high relevance accuracy without incurring prohibitive latency costs, making them suitable for largescale e-commerce search systems. Summarized descriptions strike a balance between relevance and efficiency, addressing the limitations of both sparse product titles and verbose full descriptions.

One way to summarize descriptions is to prompt LLMs to generate summaries, which may be reasonably good, but there is no guarantee that the generated summaries will be optimal for downstream tasks such as ranking, where customer queries are used as input alongside product information for the ranker.

Another way is to learn to produce queries as summaries, as in the Doc2Query framework (Nogueira et al. 2019; Nogueira, Lin, and Epistemic 2019; Li, Lin, and Lee 2024), but the setup of the problem is different compared to summarization. A set of queries is different from a summary, as there could be unnecessary repetitions of information between queries, with each query optimized/generated to match and retrieve the document. Additionally, this method shares the same issue as the simple LLM-based method mentioned above: the learning target (queries) is not completely aligned with the final downstream target (ranking). A natural approach to solve this problem is to generate a single summary for each document or product and optimize the process using reinforcement learning, where the objective is to reward summaries that lead to improved performance on downstream tasks. In our case, the downstream metric of interest is search relevance. This paper explores and addresses this solution. Overall, our main contributions in this paper are as follows: