Can AI stories be detected without analyzing writing style?
Explores whether discourse-level narrative structures like character agency and plot organization reveal AI authorship independently of surface stylistic cues, and whether such structural features resist the kind of fine-tuning that defeats style-based detection.
Most AI-text detection rides on surface signatures: word choice, syntactic structure, the overused em-dash, "delve," "tapestry." These cues are discriminatory but fragile — GPT 5.4 cut em-dash usage, and fine-tuning to mimic human style drops detection on creative writing from 97% to 3%. StoryScope asks a different question: can AI stories be told apart without stylistic signals, using only discourse-level narrative choices like character agency and chronological structure? Across a parallel corpus of 10,272 prompts (each written by a human and five LLMs, 61,608 stories of ~5,000 words), narrative features alone reach 93.2% macro-F1 for human-vs-AI detection, retaining over 97% of the performance of models that include stylistic cues.
The consequential part is the durability argument. Surface style is a post-hoc edit away from concealment; discourse-level narrative structure is not. Changing whether a protagonist's choices are morally ambiguous, or whether a plot runs on a single tidy track versus a nonlinear one with flashbacks, requires structural rewrites rather than find-and-replace. So the features that survive humanization are precisely the ones tied to how a story is conceived, not how its sentences are dressed.
Why it matters: this reframes AI detection from a stylometric arms race into a structural one, and it relocates the question of authorship. If models keep closing the surface-style gap while their narrative choices stay distinct, then detection — and, downstream, the legal question of originality — should attach to discourse structure. The counterpoint is that narrative features are themselves learnable targets; nothing prevents future training from diversifying discourse-level choices, which would erode this signal too, just more slowly than style erodes.
— "StoryScope: Investigating idiosyncrasies in AI fiction", https://arxiv.org/abs/2604.03136
Related concepts in this collection
-
Can humans detect AI text if machines can measure it?
AI-generated text shows measurable differences from human writing across multiple linguistic dimensions, yet human judges consistently fail to identify it. Why does the gap between what is measurable and what is perceptible exist?
narrative-feature separability gives a measurable axis even where human judges fail to perceive AI authorship
-
Does AI-generated text lose core properties of human writing?
Can artificial text preserve the fundamental structural features that make natural language meaningful—dialogic exchange, embedded context, authentic authorship, and worldly grounding? This asks whether AI disruption is fixable or inherent.
discourse-level divergence is a concrete manifestation of structural, not surface, differences in AI text
-
Do AI stories explain their themes more than human stories do?
Explores whether AI-generated fiction tends to spell out moral meanings rather than leaving them implicit, and whether this reflects deeper differences in how machines construct narrative versus how humans do.
names the specific narrative choices that drive the separability claimed here
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
ai fiction is distinguishable by discourse-level narrative choices not surface style which resists humanization