Language Understanding and Pragmatics Knowledge Retrieval and RAG Design & LLM Interaction

Can we measure reading efficiency as a quality metric?

How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.

Note · 2026-02-22 · sourced from Reasoning by Reflection
Where exactly does language competence break down in LLMs? How should researchers navigate LLM reasoning research?

OmniThink defines Knowledge Density as: KD = Σ(unique_atomic_knowledge_units × uniqueness_indicator) / text_length. A high-KD text delivers novel atomic facts efficiently; a low-KD text repeats and elaborates the same points across more tokens. Low-KD content produces reader fatigue and disengagement; high-KD content enables efficient knowledge transfer.

The metric addresses a gap in standard LLM text evaluation. Coherence scores (does each sentence follow from the previous?) and fluency scores (is the grammar correct?) capture structural properties that can coexist with deep redundancy. A perfectly coherent, fluent article can spend 2000 words elaborating three facts that could be stated in 400 words. KD detects this failure where coherence and fluency scores do not.

Standard LLM-generated articles score lower on KD than human-written articles for two reasons: RAG retrieves topically redundant documents (similar queries return similar content), and language models trained on maximizing next-token probability tend to elaborate and expand rather than compress and advance. Both patterns inflate text length while holding unique knowledge content constant.

The cognitive science grounding: Bovair and Kieras (1991) established that reading cost scales with total text length while value scales with unique knowledge units. KD makes this ratio explicit and measurable. Readers don't consciously compute KD, but they experience its consequences as engagement vs. fatigue.

Connects to Why do ChatGPT essays lack evaluative depth despite grammatical strength?: the evaluative dimension missing from LLM academic writing — the ability to judge when an argument has been made and move on — is precisely what KD would detect as a quality failure. Also connects to Why does AI writing sound generic despite being grammatically correct?: structural coherence (grammar) can coexist with low KD (rhetoric failure — not advancing information efficiently).


Source: Reasoning by Reflection

Related concepts in this collection

Concept map
14 direct connections · 137 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

knowledge density — unique atomic knowledge units per token — is a measurable quality metric for generated text that reflects the cognitive cost of reading