Does fine-tuning on new facts increase hallucination risk?
When LLMs learn unfamiliar facts through fine-tuning, do they become more prone to hallucinating about things they already knew? Understanding this matters for safe knowledge updates.
It is often conjectured that supervised fine-tuning on facts the model never saw in pretraining teaches it to hallucinate — by training it to assert things ungrounded in its knowledge. This work tests that in a controlled closed-book QA setup, varying the proportion of fine-tuning examples that introduce new knowledge. Two findings: LLMs struggle to acquire new factual knowledge through fine-tuning — Unknown examples are fit significantly slower than examples consistent with existing knowledge; but as those Unknown examples are eventually learned, they linearly increase the model's tendency to hallucinate on pre-existing knowledge. So the harm shows up as a form of overfitting on the slow-to-fit Unknown examples.
The keeper is a concrete fine-tuning practice: prefer early-stopping over a fixed step count, and consider filtering out Unknown examples (or keeping a few to teach uncertainty expression) — because the act of forcing new facts in degrades grounded recall.
This completes a tight loop in the vault's knowledge-acquisition thread. It is the empirical mechanism behind Can models store unlimited facts without growing larger? (externalize facts rather than fine-tune them in), it complements Does teaching question patterns before document training improve knowledge access? (encoding order matters), and it shares the leakage/overfitting concern of Does repeated sensitive data in fine-tuning cause memorization?.
Inquiring lines that use this note as a source 1
This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.
Related concepts in this collection 3
This note in its neighbourhood — explore the map, then jump to a related concept in the list below.
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
-
Can models store unlimited facts without growing larger?
Does external tool use let language models recall facts without being constrained by parameter count? This matters because it could reshape how we scale knowledge capacity beyond architectural limits.
the empirical reason to externalize facts to tools rather than fine-tune them in
-
Does teaching question patterns before document training improve knowledge access?
Standard LLM training encodes documents first, then teaches QA patterns. But does this order matter? Exploring whether reversing the sequence—teaching how knowledge gets queried before encoding it—could unlock better factual recall.
both concern how new knowledge is integrated during training
-
Does repeated sensitive data in fine-tuning cause memorization?
When language models train on the same private or proprietary data multiple times, how much do they end up memorizing and leaking that information at inference time? Understanding this risk is critical for organizations fine-tuning on confidential datasets.
shared overfitting/leakage concern from forcing data into weights
Related papers in this collection 8
Papers most semantically related to this note, ranked by cosine similarity in the embedding space.
- Does Fine-Tuning LLMs on New Knowledge Encourage Hallucinations?
- On the Impact of Fine-Tuning on Chain-of-Thought Reasoning
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
- Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration
- Linguistic Calibration of Long-Form Generations
- Query Rewriting for Retrieval-Augmented Large Language Models
- Hallucination is Inevitable: An Innate Limitation of Large Language Models
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMs
Original note title
fine-tuning on new factual knowledge is learned slowly and once learned linearly increases hallucination of pre-existing knowledge