What actually happens inside a language model?

How language models compute internally and what different training approaches teach their representations.

Topic Hub · 13 linked notes · 3 sections

View as

Sub-Topic Maps

2 notes

What actually happens inside the minds of language models?

How do LLMs represent knowledge, what circuits drive reasoning, and can we see their internal structure? Understanding the gap between external performance and internal mechanisms matters for safety and trust.

How do language models learn to think like humans?

Explores whether LLMs develop cognitive processes parallel to human reasoning, including memory, event segmentation, and belief updating. Understanding these similarities and differences reveals what training actually teaches.

Pass 3 Additions (2026-05-03)

6 notes

Does autoregressive generation uniquely enable LLM scaling?

Is the autoregressive factorization truly necessary for LLM scalability, or do other generative principles like diffusion achieve comparable performance? This matters because it shapes which architectural paths deserve investment.

Can diffusion language models match autoregressive inference speed?

Diffusion LLMs promised faster decoding through parallel token generation, but open-source implementations never outpaced autoregressive models in practice. What architectural barriers prevent diffusion from realizing its speed potential?

Can diffusion models commit to answers before full decoding?

Do diffusion language models settle on correct answers early in their refinement process, and if so, can we detect and exploit this convergence to speed up inference without losing quality?

Can diffusion models enable control that autoregressive models cannot reach?

Autoregressive language models struggle with complex global controls like syntax and infilling because they generate left-to-right and have discrete token bottlenecks. Can diffusion models' continuous latents and parallel denoising overcome these structural limitations?

Can diffusion models perform evolutionary search in parameter space?

Diffusion models and evolutionary algorithms share equivalent mathematical structures. Can we leverage this equivalence to build evolutionary search methods that preserve solution diversity better than traditional algorithms?

Can speech features be separated into semantic and stylistic components?

Linguistic theory suggests gestures decompose into semantic units and motion variations. Does this decomposition actually emerge in speech encoder layers, and can it enable more expressive gesture synthesis?