MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild

Paper · arXiv 2603.17187

Large language model (LLM) agents have rapidly emerged as powerful assistants for complex, multi-step tasks, yet agents deployed in the wild remain largely static, trained once and served unchanged regardless of how user needs evolve. This creates a fundamental tension: they must serve users continuously without interruption, yet their capabilities grow stale as the task distribution drifts with real-world usage. On platforms such as OpenClaw, where a single agent connects to 20+ messaging channels and handles diverse, evolving workloads, existing approaches either store raw trajectories without distilling transferable behavioral knowledge, maintain static skill libraries disconnected from weight optimization, or incur service downtime during retraining. We present MetaClaw, a continual meta-learning framework that jointly maintains a base LLM policy and an evolving skill library of reusable behavioral instructions, improving both through two complementary mechanisms. Skill-driven fast adaptation analyzes failure trajectories and synthesizes new skills via an LLM evolver, taking effect immediately with zero service downtime. Opportunistic policy optimization performs gradient-based weight updates via cloud LoRA fine-tuning using a process reward model, triggered only during user-inactive windows by the Opportunistic Meta-Learning Scheduler (OMLS), which monitors configurable sleep hours, system keyboard inactivity, and Google Calendar occupancy.

The two mechanisms are mutually reinforcing: a better policy produces more informative failures for skill synthesis, and richer skills yield higher-reward trajectories for policy optimization. To prevent stale reward contamination, a skill generation versioning mechanism strictly separates support data (failure trajectories consumed by skill evolution) from query data (post-adaptation trajectories used for RL updates). Built on a proxy-based architecture, MetaClaw scales to production-size LLMs without a local GPU.

Existing approaches to agent adaptation fall into three broad categories, each with notable limitations. Memory-based methods store raw conversation trajectories for future retrieval, but such trajectories are verbose and redundant, preventing the agent from extracting transferable behavioral patterns. Skill-based methods compress experience into reusable behavioral instructions, yet treat the resulting skill library as a static database never coordinated with weight optimization. RL-based methods update model weights, but operate in small-scale or offline settings and ignore a critical data validity problem: once skills have evolved, trajectories collected under the old skill context carry stale rewards that contaminate gradient updates if reused without filtration. A common thread across all three categories is that each addresses only one aspect of adaptation in isolation, leaving the complementary dimensions unexploited.

Our key observation is that two fundamentally different timescales of adaptation are in fact naturally complementary. Behavioral heuristics (e.g., "always verify a file path before reading," "confirm before destructive commands") can be distilled within seconds from a single failed conversation and injected immediately as skill instructions. Improving the model's underlying policy across diverse task types requires gradient-based optimization over many trajectories, on a timescale of minutes to hours. No existing system unifies these two forms of adaptation into a coherent framework that exploits this virtuous cycle.

We presented MetaClaw, a continual meta-learning framework that enables deployed LLM agents to improve autonomously through normal usage. MetaClaw combines two complementary adaptation mechanisms operating at different timescales: fast, inference-time skill injection that distills reusable behavioral knowledge from failures, and slow, gradient-based policy optimization that refines the model during idle windows. Built on a lightweight proxy architecture, the system requires no local GPUs and integrates transparently with existing personal agents and LLM providers. We believe MetaClaw establishes a principled foundation for agents that genuinely learn and evolve in the wild, simply by being used.

MetaClaw: Just Talk — An Agent That Meta-Learns and Evolves in the Wild

Synthesis notes that discuss concepts related to this paper