The Emotion-Memory Link: Do Memorability Annotations Matter for Intelligent Systems?

Paper · arXiv 2507.14084 · Published July 18, 2025

Abstract—Humans have a selective memory, remembering relevant episodes and forgetting the less relevant information. Possessing awareness of event memorability for a user could help intelligent systems in more accurate user modelling, especially for such applications as meeting support systems, memory augmentation, and meeting summarisation. Emotion recognition has been widely studied, since emotions are thought to signal moments of high personal relevance to users. The emotional experience of situations and their memorability have traditionally been considered to be closely tied to one another: moments that are experienced as highly emotional are considered to also be highly memorable. This relationship suggests that emotional annotations could serve as proxies for memorability. However, existing emotion recognition systems rely heavily on third-party annotations, which may not accurately represent the first-person experience of emotional relevance and memorability. This is why, in this study, we empirically examine the relationship between perceived group emotions (Pleasure-Arousal) and group memorability in the context of conversational interactions. Our investigation involves continuous time-based annotations of both emotions and memorability in dynamic, unstructured group settings, approximating conditions of real-world conversational AI applications such as online meeting support systems. Our results show that the observed relationship between affect and memorability annotations cannot be reliably distinguished from what might be expected under random chance. We discuss the implications of this surprising finding for the development and applications of Affective Computing technology. In addition, we contextualise our findings in broader discourses in the Affective Computing and point out important targets for future research efforts.

Memory for conversations and other social interactions plays a crucial role in shaping social bonds and fostering relationship building [1]. Considering human conversational memory in intelligent systems is, thus, essential for explaining and predicting human behaviour in conversations including their affective responses. Conversational memory can be defined as a sub-type of episodic memory, which manages the encoding, storage, and retrieval of personally experienced events [2], particularly within conversational settings.

Affective Computing (AC) has long focused on recognising and interpreting human emotions to enhance interactions between users and intelligent systems [3]. Emotions are considered to be central to human experience, shaping decision-making, social interactions, and memory. Their automatic detection is valuable for intelligent systems because emotional responses often signal moments of high personal relevance to users. To capture these signals, Multimodal Emotion Recognition (MER) commonly uses human behavioural cues, such as facial expressions, speech patterns, and physiological signals, to infer emotional states. While MER has made significant strides in detecting momentary affective states, its potential to model longer-term cognitive processes, such as memory, remains under-explored.

Both theoretical and empirical research suggests that the way we emotionally experience events is strongly linked to how well we remember them [4].

A common practice in developing MER systems is to collect data that operationalises emotional responses through timecontinuous measurements (e.g., annotations collected for every frame in a video stream [23]). Some of the proposed benefits [24] of this practice are considered to be its high temporal granularity (i.e., being able to capturing nuanced changes in emotional qualities over time), but also its capacity to capture emotional variability (i.e., being able to describe changes in emotional qualities within some specified unit of analysis, such as a video clip).

Since such third-party annotation is solely based on the observed behaviour it may contain some inaccuracies. For example, not all emotions might be expressed through behaviour or sometimes the expressed behaviour might not reflect the experienced emotion because of social norms (e.g. covering anger with a polite smile to avoid confrontation).

Annotation Perspectives. One of the core assumptions motivating this study was that emotional experiences influence memory encoding, a well-established link in cognitive science [5], [12]. However, in affective computing, emotional states are often inferred from third-party annotations of observed behaviour rather than first-party reports of experienced emotions [21]. Our findings indicate that this distinction is crucial: while experienced emotions are directly tied to personal relevance and cognitive appraisal [19], [20], third-party affect annotations reflect an external interpretation of group behaviour that may not reliably map onto internal memory processes. Experiment 3 has demonstrated that the significant effects seen in Experiments 1 and 2 were likely due to the differences in distributions of affect labels in the observed data, rather than that affect annotations align with memory annotations better than chance.

Group-Level Analysis. Finally, this study contributes to the growing need for group-based emotions and memory research. While prior work has examined emotion’s role in individual memory encoding [6], [7], [9], our study explicitly considers group dynamics, a key factor in real-world settings like meetings and collaborative tasks [28]–[30]. Our results suggest that group emotion annotations, which capture collective emotional states rather than individual experiences, may fail to account for memorability. This could be due to the fact that group memorability annotations might not capture an emergent group-level processes in a way that group affect does [31], since they are aggregated from individual memory reports rather than inferred from group states to begin with. Another possible reason is that people express emotions differently in group settings compared to one-on-one conversations or non-social situations. For example, research suggests that emotions tend to be expressed more strongly in dyads than in larger groups [57]. Additionally, group members often adjust their emotional expressions to match each other, a phenomenon known as emotional convergence [41], [58] This convergence may dilute individual emotional expressions, driving it further from the individual experienced emotion that would have been connected to memorability.