Agentic and Multi-Agent Systems

How can LLM agents handle huge candidate lists without breaking?

ReAct agents fail when retrieval tools return hundreds of items that overflow prompts. What architectural changes let LLMs work effectively with large candidate sets in recommendation systems?

Note · 2026-05-03 · sourced from Recommenders Conversational
What breaks when specialized AI models reach real users? Why do multi-agent systems fail despite individual capability?

The standard pattern for tool-using LLM agents is ReAct: at each step, the LLM reasons, takes an action via tool call, observes the result, and reasons again. This works when tool outputs are small. In recommender settings, retrieval tools return hundreds or thousands of candidate items — too many to fit in an observation prompt, and including the entity names degrades LLM performance.

InteRecAgent introduces two architectural fixes. First, a Candidate Bus — a separate memory accessible to all tools that holds the current candidate set without putting it in the prompt. Tools read candidates from the bus, filter, and write the filtered set back. Items flow through tools in a streaming funnel — query tool sets initial candidates, retrieval tool narrows them, ranker tool orders the survivors — without any step's output bloating the LLM's context window.

Second, plan-first execution replaces step-by-step ReAct. Instead of generating one action at a time, the LLM generates the entire tool-call sequence at once based on the user's intent, then executes it in order. This both reduces LLM inference cost (one planning call instead of N) and reduces error rates because the LLM reasons globally about the sequence. A separate "critic" LLM evaluates execution and triggers reflection if results are unsatisfactory.

The framework also distinguishes hard conditions ("popular sports games under $100" — handled by SQL queries) from soft conditions ("similar to Call of Duty" — handled by item-to-item embedding match), routing each through the appropriate tool. Long-term and short-term user profiles maintained outside the LLM's context window enable lifelong conversations without context overflow.

The general principle: when LLM agent patterns from research (ReAct, step-by-step CoT) hit production constraints (large candidate sets, long conversations, latency), the answer isn't a smarter prompt but architectural changes that move state out of the prompt entirely.


Source: Recommenders Conversational

Related concepts in this collection

Concept map
13 direct connections · 110 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

LLM-as-recommender requires plan-first execution and a candidate bus to overcome step-by-step ReAct limitations on long item lists