Can statistical rarity measure whether stories are truly original?
Can we operationalize originality as statistical rarity in narrative feature space? This matters because copyright law requires measuring human creative control, but rarity is relative, context-dependent, and doesn't guarantee quality or authorship.
As AI seeps into writing, the question of what counts as original work shifts from how a story is written to how it is conceived. StoryScope proposes a concrete operationalization: represent each story as a vector of discourse-level narrative features and treat statistical rarity in that space as a proxy for originality. Less common combinations of narrative decisions reflect the broader notion of originality invoked by creativity research (Torrance) and by copyright law, which requires a minimal degree of originality and, per recent U.S. Copyright Office guidance, sufficient human creative control. The empirical hook: human stories are, on average, rarer in narrative feature space, while the five AI models occupy a tight, well-separated cluster.
This is appealing because it converts a contested legal-aesthetic concept into something measurable and model-agnostic. Rarity does not depend on surface style (which survives the humanization edit) and aligns with the intuition that originality is about making uncommon choices, not novel word combinations. It also gives the copyright question an operational handle: a work's position in narrative-decision space could index how much distinctive human conception it carries.
Why it stays a question: rarity-as-originality is a proxy with sharp limits. Rarity is defined relative to a reference distribution, so it drifts as both human and AI writing change — and rare is not the same as good or protectable; an incoherent story can be statistically rare. Conflating "uncommon in feature space" with "originally authored by a human" risks both false positives (idiosyncratic AI output) and false negatives (a human writing in a popular convention). The construct is a useful, falsifiable starting point for measuring conception rather than execution — but whether it should bear legal or evaluative weight is exactly what it leaves open.
— "StoryScope: Investigating idiosyncrasies in AI fiction", https://arxiv.org/abs/2604.03136
Related concepts in this collection
-
Can AI stories be detected without analyzing writing style?
Explores whether discourse-level narrative structures like character agency and plot organization reveal AI authorship independently of surface stylistic cues, and whether such structural features resist the kind of fine-tuning that defeats style-based detection.
the same narrative feature space that enables detection is repurposed as a measure of originality
-
Do AI stories explain their themes more than human stories do?
Explores whether AI-generated fiction tends to spell out moral meanings rather than leaving them implicit, and whether this reflects deeper differences in how machines construct narrative versus how humans do.
explains *why* human stories land in rarer regions — ambiguity and temporal complexity are less common choices
-
Why do LLMs generate novel ideas from narrow ranges?
LLM research agents produce individually novel ideas but cluster them in homogeneous sets. This explores why high average novelty coexists with poor diversity coverage and what it means for automated ideation.
a parallel measurement problem — distinguishing genuine originality from average novelty in a feature space of ideas
-
Can humans detect AI text if machines can measure it?
AI-generated text shows measurable differences from human writing across multiple linguistic dimensions, yet human judges consistently fail to identify it. Why does the gap between what is measurable and what is perceptible exist?
extends the construct's premise: AI output is statistically separable in feature space even when humans cannot perceive the difference, the same gap that lets narrative-feature rarity index conception
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
originality can be operationalized as statistical rarity in a feature space of narrative decisions