Does fine-tuning on new facts increase hallucination risk?

When LLMs learn unfamiliar facts through fine-tuning, do they become more prone to hallucinating about things they already knew? Understanding this matters for safe knowledge updates.

Synthesis note · 2026-06-03 · sourced from Training Fine Tuning

It is often conjectured that supervised fine-tuning on facts the model never saw in pretraining teaches it to hallucinate — by training it to assert things ungrounded in its knowledge. This work tests that in a controlled closed-book QA setup, varying the proportion of fine-tuning examples that introduce new knowledge. Two findings: LLMs struggle to acquire new factual knowledge through fine-tuning — Unknown examples are fit significantly slower than examples consistent with existing knowledge; but as those Unknown examples are eventually learned, they linearly increase the model's tendency to hallucinate on pre-existing knowledge. So the harm shows up as a form of overfitting on the slow-to-fit Unknown examples.

The keeper is a concrete fine-tuning practice: prefer early-stopping over a fixed step count, and consider filtering out Unknown examples (or keeping a few to teach uncertainty expression) — because the act of forcing new facts in degrades grounded recall.

This completes a tight loop in the vault's knowledge-acquisition thread. It is the empirical mechanism behind Can models store unlimited facts without growing larger? (externalize facts rather than fine-tune them in), it complements Does teaching question patterns before document training improve knowledge access? (encoding order matters), and it shares the leakage/overfitting concern of Does repeated sensitive data in fine-tuning cause memorization?.

Inquiring lines that use this note as a source 1

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Can filtering unknown examples during fine-tuning prevent hallucination increases?

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 115 in 2-hop network ·dense cluster Open in graph ↗

Does fine-tuning on new facts increase hallucina… Can models store unlimited facts without growing l… Does teaching question patterns before document tr… Does repeated sensitive data in fine-tuning cause …

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can models store unlimited facts without growing larger? Does external tool use let language models recall facts without being constrained by parameter count? This matters because it could reshape how we scale knowledge capacity beyond architectural limits.
the empirical reason to externalize facts to tools rather than fine-tune them in
Does teaching question patterns before document training improve knowledge access? Standard LLM training encodes documents first, then teaches QA patterns. But does this order matter? Exploring whether reversing the sequence—teaching how knowledge gets queried before encoding it—could unlock better factual recall.
both concern how new knowledge is integrated during training
Does repeated sensitive data in fine-tuning cause memorization? When language models train on the same private or proprietary data multiple times, how much do they end up memorizing and leaking that information at inference time? Understanding this risk is critical for organizations fine-tuning on confidential datasets.
shared overfitting/leakage concern from forcing data into weights

Does fine-tuning on new facts increase hallucination risk?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4