Agentic and Multi-Agent Systems

Can agents learn cooperation by adapting to diverse partners?

Explores whether sequence model agents can develop mutual cooperation strategies through in-context learning when trained against varied co-players, without explicit cooperation mechanisms or hardcoded assumptions.

Note · 2026-02-23 · sourced from Agents Multi Architecture

Achieving cooperation among self-interested agents is a fundamental challenge in multi-agent reinforcement learning. Existing approaches that achieve mutual cooperation between "learning-aware" agents typically rely on hardcoded assumptions about co-player learning rules or enforce strict separation between fast-timescale "naive learners" and slow-timescale "meta-learners." Both constraints limit scalability.

This paper shows that in-context learning capabilities of sequence models provide a cleaner path. Training sequence model agents against a diverse distribution of co-players naturally induces in-context best-response strategies that effectively function as learning algorithms on the fast intra-episode timescale. No hardcoded assumptions about the opponent. No explicit timescale separation.

The cooperation mechanism is elegant: in-context adaptation renders agents vulnerable to extortion (because they adapt to exploitative strategies). This vulnerability creates mutual pressure between agents — each agent's in-context learning dynamics can be shaped by the other. The resulting mutual shaping pressure resolves into cooperative behavior.

Three components are necessary and sufficient: (1) sequence model agents with in-context learning capacity, (2) diverse co-player distribution during training, and (3) decentralized reinforcement learning. Co-player diversity is the key ingredient — it forces the agent to develop general in-context adaptation rather than memorizing responses to specific opponents.

Since Can transformers learn to solve new problems within episodes?, this finding extends ICRL from single-agent environments to multi-agent cooperation. The in-context learning mechanism that enables environment adaptation also enables co-player adaptation — and the social dynamics of mutual adaptation produce emergent cooperation.

The connection to Can cooperative bots escape frozen selfish populations? is structural: random exploration breaks frozen equilibria in population games; diverse co-player training breaks the equilibrium of mutual defection in dyadic games. Both work through diversity of experience rather than explicit cooperation incentives.


Source: Agents Multi Architecture

Related concepts in this collection

Concept map
12 direct connections · 123 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

in-context co-player modeling enables cooperation without hardcoded assumptions — training against diverse co-players induces mutual shaping through vulnerability to extortion