Artifacts as Memory Beyond the Agent Boundary
The situated view of cognition holds that intelligent behavior depends not only on internal memory, but on an agent’s active use of environmental resources. Here, we begin formalizing this intuition within Reinforcement Learning (RL). We introduce a mathematical framing for how the environment can functionally serve as an agent’s memory, and prove that certain observations, which we call artifacts, can reduce the information needed to represent history. We corroborate our theory with experiments showing that when agents observe spatial paths, the amount of memory required to learn a performant policy is reduced. Interestingly, this effect arises unintentionally, and implicitly through the agent’s sensory stream. We discuss the implications of our findings, and show they satisfy qualitative properties previously used to ground accounts of external memory. Moving forward, we anticipate further work on this subject could reveal principled ways to exploit the environment as a substitute for explicit internal memory.
According to the situated view of cognition, competent action depends not only on internal memory, but on an agent’s use of environmental resources (Hutchins, 1995; Clark, 1998; Menary, 2010). On some accounts, the environment itself can implicitly function as an agent’s memory (Clark & Chalmers, 1998; Sutton, 2003). In this paper, we aim to formalize such cases within Reinforcement Learning (RL). As a first step, we focus on one form of externalized memory which centers on the use of artifacts (Hutchins, 2001) to store information about an agent’s previous interactions—for instance, a trail of breadcrumbs indicating where the agent has been before.
We make three main contributions. First, we introduce a mathematical framing for how the environment can functionally serve as an agent’s memory. Our framework grounds the concept of artifacts as observations that inform the past (Definition 1), and proves the amount of information needed to represent a history is reduced when artifacts are present (Theorem 1). We equate externalized memory to a condition on the amount of capacity needed to learn a performant policy (Definition 3), and show the amount of externalized memory can be systematically quantified. Our proposed method compares the capacity needed to match performance across two settings that differ in whether the agent can observe behavioral artifacts, such as a spatial path.
Second, we empirically confirm that RL agents can use spatial environments as a form of memory. We find evidence for this in a five different settings and from two core agent designs: Q-learning (Watkins & Dayan, 1992) and DQN (Mnih et al., 2015). In each case, we find the use of external memory arises unintentionally; leaving behind a spatial path—like a trail of breadcrumbs—is enough for the agent to experience the effect.
Third, we place our results in a broader conceptual context and show that they satisfy qualitative properties previously used to ground accounts of external memory (Michaelian, 2012; Sims & Kiverstein, 2022). We discuss our results, and suggest that further work in this area could yield principled ways to exploit the environment as a substitute for explicit internal memory.
Example 1 (Page Keeping). Alice is an avid reader of books. Like many, she reads only a few pages at a time. Instead of remembering the page number where she stopped, she marks her place by folding the corner of the page. When she picks up the book later, she unfolds the corner and continues to read.
This interaction can be represented by the artifactual environment pictured in Figure 1. Observations indicate three basic situations where Alice sees a folded page (A), an unfolded page (B), or something unrelated (C). Whenever Alice observes A, she knows that B must have occurred. Thus, in this context, a folded page serves as an artifact.
The existence of artifacts can be expressed as a probabilistic property of the environment. Proofs of formal claims are provided in Section A of the Supplement.
Artifacts as Situated Memory. Situated accounts of memory enrich the classical model by grounding memory’s purpose in service of decision-making (Clark & Chalmers, 1998). In regards to external memory, Michaelian (2012) argues that a model must satisfy certain criteria to capture the essential functionality of natural memory. Michaelian requires agents have constant access to an information-bearing resource and some process to determine the information’s relevance. Following Sims & Kiverstein (2022), we summarize these in three points:
Survival relevant: a memory should bring positive value to decision making.
Susceptible to change: memories are mutable.
Selection: a memory’s relevance is determined through some selection process.
The first requirement underscores the cost of storage. As Sims & Kiverstein (2022) put it: memory must be “worth its weight in terms of long-term fitness benefits.” The second point preserves the basic functionality of the encode-store-retrieve model, while the third requires the existence of a process to determine a memory’s relevance in a given scenario.
Sims & Kiverstein (2022) use these desiderata to argue the spatial trails left behind by slime mold (Reid et al., 2012) function as external memory. We similarly argue that the artifacts from our empirical study satisfy these desiderata. In support of (1), note that an artifact’s value is immediately apparent from total reward (see Figures 3, 6, and 7); agents in artifactual environments consistently accumulate more reward than in artifactless environments. Support for (2) follows directly from the encode-store-retrieve model, to which artifacts from the Dynamic Path conform (see Figure 7).
Fixed artifacts provide read-only information and yet still produce an external memory effect, suggesting that reading is more fundamental than writing when learning to navigate and the desiderata may need further refinement. Support for (3) comes from the learning process. Through repeated credit assignment, policies that read and write on each step gradually improve and bias navigation toward goal-relevant locations. With these properties in place, we conclude the artifacts from our study support the same arguments and conclusions as previous accounts of external memory.
Unintentional Memory. Our experiments demonstrate that an agent can read and write information to the environment without any explicit objective directing it to do so. In each experiment, agents were given a standard navigation objective: a sparse reward signal providing a bonus for reaching the goal, but no explicit incentive to follow a path. Still, we observe path-following behavior, as performance would otherwise match the No Path baseline. Moreover, in the Dynamic Path environment (Figure 7), agents record traces of their previous interactions without explict direction. These artifacts go on to guide future behavior. Remarkably, this form of emergent behavior requires no explicit design or human involvement; it emerges naturally as a consequence of reinforcement learning in a sufficiently complex environment.
Implications for Agent Design. A popular line of research pursues designs whose performance scales with the number of trainable parameters. This direction is motivated by milestone developments (Silver et al., 2016; Brown et al., 2020; Fawzi et al., 2022), historical arguments for the primacy of computation (Sutton, 2019), and empirical findings of power-law relationships between system capacity and performance (Kaplan et al., 2020). Our results hint at another path: rather than scaling system resources, performance gains may instead arise from environments that coevolve with the agent. It is possible that current designs are already sufficient for competent, human-level performance, but require judicious pairing with an appropriate environment to scaffold problem solving (Sterelny, 2010). More work is needed to understand the laws governing the relationship between environment and agent design.