How do agentic AI systems decompose into adaptation paradigms?

What are the core dimensions that distinguish different approaches to adapting agents and tools in agentic systems? Understanding this taxonomy could clarify which adaptation strategy fits which problem.

Note · 2026-02-23 · sourced from Agents

The adaptation landscape for agentic AI systems is cleaner than it appears. Two binary dimensions — what gets optimized (agent or tool) and what provides the signal (tool execution or agent output) — generate four paradigms that cover the principal modes of adaptation:

A1: Tool Execution Signaled Agent Adaptation — The agent is optimized using feedback from external tool execution. When the agent generates a retrieval query and the retriever returns documents, metrics like recall or nDCG computed from retrieval results directly reward the agent. Example: DeepRetrieval optimizes the agent's query generation using retrieval quality scores.

A2: Agent Output Signaled Agent Adaptation — The agent is optimized using evaluation of its final output after incorporating tool results. The full pipeline runs (retrieve → integrate → answer), and the answer's correctness drives the reward signal. Example: Search-R1 rewards based on exact match of the final answer, not the retrieval quality.

T1: Agent-Agnostic Tool Adaptation — Tools are trained independently of any specific agent. Retrievers, domain-specific models, and pretrained components function as plug-and-play modules. The agent remains frozen; the tool improves on its own.

T2: Agent-Supervised Tool Adaptation — Tools are adapted using signals derived from the frozen agent's outputs. Reward-driven retriever tuning, adaptive rerankers, and memory-update modules all fall here — the agent defines what "good" means for the tool.

The taxonomy is practically useful because it maps directly to implementation decisions. A1 vs A2 determines where the loss function sits: at the tool boundary or at the output boundary. T1 vs T2 determines whether tool improvement requires an agent in the loop or not. Since How do knowledge injection methods trade off flexibility and cost? provides a parallel taxonomy for knowledge injection, these two frameworks are complementary: one classifies what gets injected, the other classifies how the system adapts.

The RAG setting illustrates the A1/A2 distinction clearly: A1 optimizes the agent to write better queries (retrieval quality as reward), while A2 optimizes the agent to produce better final answers (answer correctness as reward). These are different objectives and can pull in different directions — a query that retrieves the best documents is not necessarily the query that produces the best final answer when the agent has limited context integration ability.

Source: Agents

Related concepts in this collection

How do knowledge injection methods trade off flexibility and cost? When and how should domain knowledge enter an AI system? This explores the speed, training cost, and adaptability trade-offs across four injection paradigms, and when each approach suits different deployment constraints.
complementary taxonomy: what gets injected vs how the system adapts
Does model access level determine which specialization techniques work? Different specialization approaches require different levels of access to a model's internals. Understanding this constraint helps practitioners choose realistic techniques for their domain adaptation goals.
another dimension: access level determines which paradigms are available
Can document count be learned instead of fixed in RAG? Standard RAG systems use a fixed number of documents regardless of query complexity. Can an RL agent learn to dynamically select both how many documents and their order based on what helps the generator produce correct answers?
T2 paradigm instance: agent-supervised tool adaptation for retrieval
Does supervising retrieval steps outperform final answer rewards? Can intermediate feedback on retrieval decisions—which documents to fetch, when to stop—train agentic RAG systems more effectively than rewarding only the final answer? This matters because poor retrieval paths can accidentally succeed or good ones can fail on noisy metrics.
connects A1 vs A2: process supervision is an A1 approach applied within an A2 pipeline
Does RL teach reasoning or just when to use it? Does reinforcement learning in thinking models actually create new reasoning abilities, or does it simply teach existing capabilities when to activate? This matters for understanding where reasoning truly emerges.
the when-not-how thesis maps onto the A1/A2 distinction: A1 (execution-signaled) rewards the how of tool interaction, while A2 (output-signaled) rewards the when of deployment timing; RL as a deployment optimizer suggests A2 paradigms may be more effective for agent adaptation

Concept map

19 direct connections · 158 in 2-hop network ·medium cluster

How do agentic AI systems decompose into adaptat… How do knowledge injection methods trade off flexi… Does model access level determine which specializa… Can document count be learned instead of fixed in … Does supervising retrieval steps outperform final … Does RL teach reasoning or just when to use it?

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

agentic AI adaptation decomposes into four paradigms along two dimensions — agent versus tool optimization target and execution-signaled versus output-signaled feedback