INQUIRING LINE

What makes certain bond distributions more learnable than others?

This reads 'bond distributions' as a question about why some distributions are easier for a model to learn than others — though I should flag up front that this corpus addresses the learnability of distributions in machine learning broadly (reasoning, fine-tuning, RL), not the bond-length/bond-angle distributions of chemistry; if you meant the molecular-structure sense, the collection doesn't yet hold material on it.


This reads 'bond distributions' as a question about learnable distributions generally, since the corpus has no notes on chemical bonds but a great deal on why some distributions are learnable and others aren't. The single loudest answer running through the collection: a distribution is learnable to the degree it sits close to what the model already represents. Chain-of-thought reasoning degrades in a predictable way the moment you push it outside the training distribution — fluent on the surface, logically broken underneath Does chain-of-thought reasoning actually generalize beyond training data?. Even the length of a model's reasoning trace turns out to be a proximity signal rather than a difficulty signal: traces stretch with hardness only in-distribution and decouple entirely once you leave it Does longer reasoning actually mean harder problems?. So 'learnable' and 'near the existing distribution' keep collapsing into the same thing.

That reframes the question: not which distributions are learnable in the abstract, but how far a model can move from where it started without breaking. One note makes this almost mechanical — keeping low KL drift from the base model preserves *plasticity*, the ability to keep learning later tasks; parameter-only methods that drift hard stall out when the domain shifts, while staying close keeps the model adaptable Does staying close to the base model preserve learning ability?. Learnability here isn't a property of the target distribution alone; it's a budget you spend by moving away from your origin.

There's a sharp tension lurking in this, though. You can make a distribution *more* learnable in one place by paying for it elsewhere. Teachers that condition on the correct answer produce confident, compressed traces that students absorb easily — but that very confidence suppresses the uncertainty signals needed to generalize, so in-domain learnability is bought with out-of-distribution brittleness Does richer teacher context hurt student generalization?. Sharpening a distribution makes it crisp and imitable and simultaneously narrows what it can transfer to. The easiest-to-learn version is often the least robust one.

And some distributions resist learning no matter how you approach them. Across constrained-optimization tasks, models plateau around 55–60% satisfaction regardless of scale, architecture, or training regime — a ceiling, not a gap you can close with more data Do larger language models solve constrained optimization better?. That's the counterpoint to proximity: closeness explains a lot, but structure in the target itself can be the wall. Two things the corpus quietly adds that you might not expect: when learning *does* take, it lands in a surprisingly structured place — RL reliably updates the same sparse, near-full-rank 5–30% subnetwork across random seeds, suggesting learnability has a consistent shape rather than being arbitrary Does reinforcement learning update only a small fraction of parameters?. And what looks like a single learned answer is still just one draw from a distribution — fixing the seed makes outputs consistent without making them reliable Does setting temperature to zero actually make LLM outputs reliable?.

The thing worth taking away: learnability in this collection is relational, not intrinsic. A distribution is learnable mostly in proportion to how near it is to the model's existing one, how little plasticity you burn reaching it, and whether its internal structure has a hard ceiling — and the moves that make it easiest to learn are frequently the same moves that make it fail to generalize.


Sources 7 notes

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Does longer reasoning actually mean harder problems?

Controlled A* maze experiments show trace length correlates with difficulty only in-distribution but decouples entirely out-of-distribution. Trace length primarily reflects recall of training schemas, not adaptive computation.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Does richer teacher context hurt student generalization?

Teachers conditioned on correct answers and verifier output produce confident, concise traces that students inherit. This style suppresses uncertainty expression, optimizing in-domain performance while degrading generalization to out-of-distribution problems that require epistemic caution.

Do larger language models solve constrained optimization better?

Across constrained-optimization tasks, LLMs converge to ~55–60% constraint satisfaction independent of architecture, parameter count, or training regime. Reasoning models do not systematically outperform standard models, suggesting a fundamental ceiling rather than a scaling gap.

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Does setting temperature to zero actually make LLM outputs reliable?

Fixed seeds and zero temperature replicate the same output repeatedly, but that output remains one draw from the model's probability distribution. McDonald's omega testing across 100 repetitions reveals that consistency does not equal reliability.

Next inquiring lines