Can LLMs recommend products without ever seeing the catalog?
Explores whether language models can learn to generate effective search queries for recommendation systems without direct access to inventory data. This challenges the intuition that good recommendations require knowing what items exist.
A counterintuitive empirical finding from Rec-R1's product search experiments. The trained LLM never sees the downstream item catalog. It receives a user query and generates a rewritten query, without knowing what products exist in the recommender's database. By the intuition that "good recommendation requires knowing what's available," this should not work. It does, consistently, across domains.
The mechanism becomes clear once you compare to human search behavior. People rarely know the exact contents of a platform's inventory. They refine queries iteratively based on vague goals and system feedback — they search, see results, adjust the query based on what came back, search again. The catalog enters the loop indirectly through the system's response, not directly through advance knowledge.
Rec-R1 trained in closed-loop with the recommender learns this refinement process via reinforcement learning. The LLM's rewards depend on whether its generated query produces good ranking metrics from the recommender. Over training, the model learns implicit catalog awareness — which query forms produce good rankings on this specific recommender — without ever being shown the catalog explicitly.
The deployment consequence is significant for production systems with proprietary or constantly-changing catalogs. The LLM does not need access to the inventory database, does not need refresh cycles when the catalog changes, does not need synchronization protocols. As long as it can interact with the live recommender, it can stay aligned with evolving content trends. Rec-R1 is also compatible with real-time feedback — trained via online interactions with a live recommender where the LLM receives immediate performance signals (engagement rates, conversions).
The broader observation: closed-loop training can substitute for the access patterns we assume systems need. What looks like "the LLM needs to know the catalog" is often "the LLM needs to produce queries that work for this catalog" — and the second can be learned from feedback without the first.
Related concepts in this collection
-
Can recommendation metrics train language models directly?
Explores whether LLMs can be optimized through closed-loop reinforcement learning using real recommendation system outputs as rewards, rather than relying on expensive proprietary model distillation.
same paper, the architectural enabler
-
How can LLM agents handle huge candidate lists without breaking?
ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?
adjacent: a different architectural pattern for LLM-recommendation integration
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
LLMs trained via closed-loop RL with recommendation feedback can recommend without seeing the catalog — they learn iterative query refinement from system metrics alone