Agentic and Multi-Agent Systems

How can GUI agents adapt when software constantly changes?

Can desktop automation agents stay current by combining real-time web documentation with learned task patterns and concrete execution memories? This explores how to avoid training obsolescence in open-world software environments.

Note · 2026-05-03 · sourced from Tool Computer Use

The challenge Agent S targets is that GUI automation must work across a vast and constantly evolving universe of applications and websites. No fixed knowledge base survives — the agent must learn from open-world experience while still benefiting from domain-specific specialization. The proposed architecture answers this with a three-source planning method.

External: Online Web Knowledge provides up-to-date documentation about specific applications, allowing adaptation to software that has changed since training. This is the "look it up" channel — useful precisely because the open world drifts.

Internal-abstract: Narrative Memory stores high-level, abstractive task experiences from past interactions — the gestalt of how a kind of task plays out, used during top-level decomposition. Internal-concrete: Episodic Memory stores detailed, step-by-step subtask experience — retrieved during execution to refine specific actions in context.

The two-tier internal memory matters because complex desktop tasks span timescales: high-level decomposition needs abstract task patterns, but low-level execution needs concrete state-action sequences. Successful subtasks and full task experiences are evaluated by a self-evaluator and stored back, enabling continual improvement.

The differentiation from prior RAG-for-agents work is precise: rather than retrieving exemplars or guidelines uniformly, this design uses task experience hierarchically — full task experience summarized into abstractive textual reward for subtask planning, subtask experience self-evaluated before storage. The implication is that GUI agents in open worlds need more than memory; they need stratified memory whose levels match the levels of the planning problem. The same paper introduces the Can structured interfaces help language models control GUIs better? as the perception-side companion to this memory architecture — together they illustrate that GUI agents need factoring at both perception and memory layers.


Source: Tool Computer Use

Related concepts in this collection

Concept map
15 direct connections · 73 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

experience-augmented hierarchical planning combines external web knowledge with narrative and episodic memory — letting GUI agents adapt to open-world software change