Do persistent agents really cost less per token?
When AI agents reuse cached context across tasks, does the standard cost-per-token metric still reveal true economic efficiency? A case study suggests the answer may be no.
A 115-day case study of one physician-scientist running a persistent agentic research environment found that 82.9% of recorded May tokens were cache reads. The workflow was cache-dominant: the agent increasingly reasoned over reused accumulated context rather than fresh inference. The author's inference is that persistent agentic environments may shift the economic unit from cost per token to cost per completed artifact.
This matters because cost-per-token is the native pricing and benchmarking unit, and it systematically misleads about persistent agents. When most tokens are cheap cache reads against a durable memory layer, the marginal token tells you almost nothing about the cost of getting useful work done — the expensive resource is the accumulated context and reusable procedures that make each new task cheap. Two agents with identical token counts can differ enormously in artifacts produced.
The counterpoint is that cost-per-artifact is hard to standardize — "artifact" is fuzzy (a paragraph? a paper? a repository?) and reproducible artifact-level denominators barely exist, which is exactly why the field defaults to tokens. But defaulting to the measurable wrong unit is still wrong. Therefore the methodological recommendation that follows is concrete: future evaluations should adopt artifact-level denominators and cost-per-artifact estimates, because the economics of a stateful, cache-dominant agent live at the artifact level, not the token level.
— "Persistent AI Agents in Academic Research: A Single-Investigator Implementation Case Study", https://arxiv.org/abs/2605.26870
Related concepts in this collection
-
Why does agent efficiency differ from model size reduction?
Explores why making models smaller doesn't solve agent cost problems. Agents loop recursively, compounding costs multiplicatively, so efficiency requires system-level design, not just parameter reduction.
extends: both reject per-token accounting for agents, this note via cache-dominant economics, that note via the success-versus-cost frontier as the right metric
-
What should we actually measure in agent evaluation?
Current agent benchmarks reduce performance to a single success metric, potentially hiding critical differences in how agents operate. What dimensions beyond task accuracy should evaluation frameworks capture?
synthesizes: cost-per-artifact is the economic counterpart to the trajectory-level evaluation this note's denominator demands
-
What makes agent-created code artifacts so hard to manage?
Agent-authored code that persists and is shared across systems raises difficult questions about what should be kept versus discarded, and how to maintain consistent state when multiple agents collaborate on the same artifacts.
grounds the artifact unit: the persistent, reusable artifacts that make each new task cheap are exactly the cache-dominant durable layer driving the cost shift
-
Will agents compete for attention just like users do?
As autonomous agents take over user tasks, will the Web's economic competition shift from human clicks to agent invocations? This explores whether existing ad-market mechanisms could scale to agent decision-making.
synthesizes: both relocate the economic unit away from human-facing metrics (clicks, tokens) toward agent-completed work
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
persistent agentic environments shift the economic unit from cost per token to cost per completed artifact