What makes agent-created code artifacts so hard to manage?

Agent-authored code that persists and is shared across systems raises difficult questions about what should be kept versus discarded, and how to maintain consistent state when multiple agents collaborate on the same artifacts.

Note · 2026-05-28 · sourced from Tool Computer Use

Among the three elements of agentic code — model capability, harness infrastructure, and agent-initiated artifacts — the survey flags the third as the one that "remains relatively underexplored." Agent-initiated code artifacts are the interactive objects an agent creates, executes, observes, revises, persists, and shares during a task: patches and tests authored over a live repository, interface commands synthesized against DOM trees, hypothesis-testing pipelines composed on the fly, executable policies and skill libraries revised in response to environment feedback. These appear across coding assistance, GUI/OS automation, scientific discovery, and embodied control — yet they sit outside the well-mapped territory of predefined infrastructure.

The open questions cluster around persistence and sharing. When an agent writes code that outlives the current step, what should persist and what should be discarded? When multiple agents share artifacts, how is consistent state maintained, and how is a useful artifact promoted from one-off scratch work to durable, reviewable infrastructure? The survey's listed open challenges — evaluation beyond final task success, verification under incomplete feedback, regression-free harness improvement, consistent shared state across agents, human oversight for safety-critical actions — converge on exactly this layer. The counterpoint is that some agent-authored code is genuinely disposable and over-engineering its lifecycle wastes effort. But this matters because the artifacts an agent creates may be where the next gains in autonomy and coordination live, and they are precisely what current harness engineering least understands.

— "Code as Agent Harness: Toward Executable, Verifiable, and Stateful Agent Systems", https://arxiv.org/abs/2605.18747

Related concepts in this collection

Can agents learn reusable sub-task routines from past experience? Do web agents fail at long-horizon tasks because they cannot extract and reuse workflows shared across similar problems? This explores whether sub-task abstraction enables skill accumulation rather than task-by-task problem solving.
a concrete case of persistent agent-authored artifacts (reusable routines) compounding over time
Can agents adapt without pausing service to users? Can deployed LLM agents continuously improve their capabilities while serving users without interruption? This explores whether fast behavioral updates and slow policy learning can coexist across different timescales.
addresses how agent-created skills should persist and be promoted, the lifecycle this note raises

Concept map

14 direct connections · 90 in 2-hop network ·medium cluster Open in graph ↗

What makes agent-created code artifacts so hard … Can agents learn reusable sub-task routines from p… Can agents adapt without pausing service to users?

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Original note title

agent-initiated code artifacts that persist and are shared are the underexplored frontier of harness engineering

What makes agent-created code artifacts so hard to manage?

Related concepts in this collection

Related papers in this collection