How can LLM agents handle huge candidate lists without breaking?
ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?
The standard pattern for tool-using LLM agents is ReAct: at each step, the LLM reasons, takes an action via tool call, observes the result, and reasons again. This works when tool outputs are small. In recommender settings, retrieval tools return hundreds or thousands of candidate items — too many to fit in an observation prompt, and including the entity names degrades LLM performance.
InteRecAgent introduces two architectural fixes. First, a Candidate Bus — a separate memory accessible to all tools that holds the current candidate set without putting it in the prompt. Tools read candidates from the bus, filter, and write the filtered set back. Items flow through tools in a streaming funnel — query tool sets initial candidates, retrieval tool narrows them, ranker tool orders the survivors — without any step's output bloating the LLM's context window.
Second, plan-first execution replaces step-by-step ReAct. Instead of generating one action at a time, the LLM generates the entire tool-call sequence at once based on the user's intent, then executes it in order. This both reduces LLM inference cost (one planning call instead of N) and reduces error rates because the LLM reasons globally about the sequence. A separate "critic" LLM evaluates execution and triggers reflection if results are unsatisfactory.
The framework also distinguishes hard conditions ("popular sports games under $100" — handled by SQL queries) from soft conditions ("similar to Call of Duty" — handled by item-to-item embedding match), routing each through the appropriate tool. Long-term and short-term user profiles maintained outside the LLM's context window enable lifelong conversations without context overflow.
The general principle: when LLM agent patterns from research (ReAct, step-by-step CoT) hit production constraints (large candidate sets, long conversations, latency), the answer isn't a smarter prompt but architectural changes that move state out of the prompt entirely.
Source: Recommenders Conversational
Related concepts in this collection
-
How should LLM-based recommenders retrieve from massive item corpora?
When conversational recommenders need to search millions of items, the LLM cannot memorize the corpus. What retrieval strategies work best under different constraints, and how do they trade off latency, sample efficiency, and scalability?
complements: candidate-bus is the architectural complement to retrieval-strategy choice — the bus carries what the chosen strategy returns
-
Why do protocol-based tool systems fail in production agentic workflows?
Explores whether standardized tool protocols like MCP introduce non-determinism that undermines reliable agent execution, and what causes ambiguous tool selection in production systems.
complements: plan-first execution is the deterministic-call pattern in recommender setting — same anti-ReAct lesson
-
Does structured artifact sharing outperform conversational coordination?
Explores whether agents coordinating through standardized documents rather than natural language messages achieve better collaboration outcomes. Matters because it challenges the default conversational paradigm in multi-agent system design.
complements: candidate bus is a standardized artifact between tools — same SOP-over-natural-language coordination lesson at smaller scale
-
Can we automatically optimize both prompts and agent coordination?
This explores whether language agents can be represented as computational graphs whose structure and content adapt automatically. Why it matters: current agent systems require hand-engineered orchestration; automatic optimization could unlock more capable multi-agent systems.
extends: candidate bus + plan are graph-orchestration primitives — InteRecAgent is one graph instantiation
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLM-as-recommender requires plan-first execution and a candidate bus to overcome step-by-step ReAct limitations on long item lists