How do agents discover and construct new APIs from existing applications?

This explores how agents can turn an existing app's messy interface into clean, reusable programmatic tools — discovering what actions are possible and wrapping them as APIs they can call directly, rather than clicking through screens every time.

This explores how agents can turn an existing app's messy interface into clean, reusable programmatic tools — and the corpus frames it as a bootstrapping problem with a clear payoff. The central piece is AXIS Can API-first agents outperform UI-based agent interaction?, which shows agents work far faster when they call APIs instead of clicking through UI step by step (65–70% less time, while staying 97–98% accurate). The catch: most applications don't hand you a tidy API. AXIS's answer is a self-exploration mechanism that pokes at an existing app, learns what actions it supports, and constructs the missing APIs itself — so the agent generates the very interface it then uses.

That 'explore, then crystallize into reusable tools' loop shows up across the collection under different names. VOYAGER Can agents learn new skills without forgetting old ones? stores discovered behaviors as executable skills in a searchable library and composes complex ones from simpler ones — essentially the same act of turning raw exploration into a callable building block. Agent Workflow Memory Can agents learn reusable sub-task routines from past experience? does it at the level of sub-task routines: it watches what worked, strips out the example-specific details, and saves the abstracted pattern for reuse, with bigger gains the more novel the new task is. The shared insight is that a constructed API isn't just a wrapper — it's a generalization extracted from experience.

There's a live design tension here worth knowing about: should an agent build its tool set up front, or discover as it goes? DeepAgent Can agents discover tools dynamically instead of pre-selecting them? argues for discovery mid-execution — when the space of possible tools is too large to enumerate, finding them as needed lets the agent keep the whole task in view and change strategy on the fly. That pairs naturally with AXIS-style construction: you don't pre-build every API, you synthesize the one you need when you hit the wall.

What the constructed APIs are made of matters too. Code Can code become the operational substrate for agent reasoning? turns out to be the ideal substrate, because it's executable, inspectable, and stateful all at once — an agent can write a tool, run it, look inside when it breaks, and carry state between steps. And training agents to use these tools is its own problem: ToolPO Can simulated APIs and token-level credit assignment train better tool-using agents? sidesteps the cost and flakiness of hammering real APIs during training by having an LLM simulate them, then credits the specific tool-call tokens that mattered.

The thing you might not have known you wanted to know: discovery doesn't have to stay siloed in one agent's head. SkillClaw How can agent systems share learned skills across users? pools interaction trajectories across many users, distills recurring patterns into refined skills, and pushes them back out — so one agent's discovered API becomes everyone's. And when these tools need to interoperate, the winning move is to wrap rather than replace: coordination layers Should coordination protocols wrap existing systems or replace them? gain adoption by bridging existing protocols like MCP instead of demanding a rewrite — the same wrap-what-exists philosophy that lets an agent build an API over an app it didn't design.

Sources 8 notes

Can API-first agents outperform UI-based agent interaction?

The AXIS framework shows that prioritizing API calls over sequential UI interactions cuts task completion time by 65–70% while maintaining 97–98% accuracy and reducing cognitive workload by 38–53%. A self-exploration mechanism automatically discovers and constructs APIs from existing applications, solving the bootstrapping problem.

Can agents learn new skills without forgetting old ones?

VOYAGER demonstrates that storing executable skills in an embedding-indexed library and composing complex skills from simpler ones allows agents to learn continuously while avoiding the forgetting that occurs with weight-update-based methods. Environmental feedback refines skills while an automatic curriculum drives continual exploration.

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

Can agents discover tools dynamically instead of pre-selecting them?

DeepAgent demonstrates that discovering tools as needed—rather than pre-retrieving a fixed set—enables agents to maintain global task perspective and adapt strategy mid-execution. This approach scales better for long-horizon tasks where the tool space is too large to enumerate.

Can code become the operational substrate for agent reasoning?

Research shows code uniquely enables agents to externalize reasoning, execute policies, model environments, and verify progress through its simultaneous executability, inspectability, and statefulness across task steps.

Can simulated APIs and token-level credit assignment train better tool-using agents?

ToolPO replaces costly real-API interactions with LLM-simulated ones and assigns credit directly to tool-invocation tokens rather than spreading outcome rewards across trajectories. This combination improves training stability and sample efficiency for tool-using agents.

How can agent systems share learned skills across users?

SkillClaw aggregates interaction trajectories across users, processes them through an autonomous evolver that identifies patterns and refines skills, then synchronizes updates system-wide. This converts siloed individual learning into shared capability improvement without manual curation.

Should coordination protocols wrap existing systems or replace them?

Research shows that agent coordination standards achieve adoption by composing existing protocols like MCP and DIDComm under a shared substrate, rather than competing to replace them. Bridging lets value accrue incrementally without forcing ecosystem-wide rewrites.

How do agents discover and construct new APIs from existing applications?

Sources 8 notes

Next inquiring lines