Large language models can segment narrative events similarly to humans
Humans perceive discrete events such as "restaurant visits" and "train rides" in their continuous experience. One important prerequisite for studying human event perception is the ability of researchers to quantify when one event ends and another begins. Typically, this information is derived by aggregating behavioral annotations from several observers. Here we present an alternative computational approach where event boundaries are derived using a large language model, GPT-3, instead of using human annotations. We demonstrate that GPT-3 can segment continuous narrative text into events. GPT-3-annotated events are significantly correlated with human event annotations. Furthermore, these GPT-derived annotations achieve a good approximation of the “consensus” solution (obtained by averaging across human annotations); the boundaries identified by GPT-3 are closer to the consensus, on average, than boundaries identified by individual human annotators. This finding suggests that GPT-3 provides a feasible solution for automated event annotations, and it demonstrates a further parallel between human cognition and prediction in large language models.
Studying event cognition with naturalistic stimuli like movies and stories typically involves laborious hand annotation of event boundaries that are often crowd-sourced from large behavioral samples in online experiments
We prompt GPT-3 with a similar (albeit simpler) prompt to what human participants are shown when asked to segment a natural stimulus into meaningful events. Across a variety of stories, we find that GPT-3 is able to perform this task significantly above chance. Moreover, we demonstrate that GPT-derived event boundaries provide a good approximation of the consensus solution (obtained by averaging responses across people); the boundaries identified by GPT-3 are closer to the consensus, on average, than boundaries identified by individual human annotators