What other latent LLM capabilities remain inactive without explicit activation cuing?

This explores which abilities a model already possesses but doesn't use unless something explicitly switches them on — and the corpus turns out to have a surprisingly rich map of these dormant-until-cued capacities.

This reads the question as: beyond reasoning, what else can a model already do but won't unless prompted, steered, or structured into doing it? The corpus suggests the more interesting story isn't a list of hidden skills — it's that 'latent until cued' may be the default condition for a lot of what models can do, and the cue takes wildly different forms.

The clearest cases are about reasoning, and they converge from three directions. Reinforcement learning mostly *activates* reasoning the base model already had rather than teaching anything new Does reinforcement learning create new reasoning abilities or activate existing ones?. You don't even need RL: steering a single interpretability-identified feature reproduces chain-of-thought-level performance, and that reasoning mode fires early and overrides surface instructions — implying it sits there latent regardless of how you prompt Can we trigger reasoning without explicit chain-of-thought prompts?. And wrapping operations in modular 'cognitive tools' lifts GPT-4.1 on hard math with no training at all, purely by *isolating* steps the model could already perform but wouldn't reliably sequence on its own Can modular cognitive tools unlock reasoning without training?.

The more unexpected finds are non-reasoning. Models have a latent capacity for *introspection* — detecting when their own internal activations have been tampered with — but it only emerges after preference training (DPO), and safety training actively *suppresses* it, dropping detection from 64% to 11% How do language models detect injected steering vectors internally?. So here the 'cue' isn't a prompt at all; it's a training stage, and another training stage can switch the same capability back off. Similarly, the ability to *lead* a conversation rather than just react is present but dormant: models are trained to optimize the next reply, not multi-turn goals, so proactivity is a training-incentive gap, not a capability one Why can't AI models lead conversations on their own?.

Then there are capacities that stay dormant unless you force their structure explicitly. Models often know the right principle but won't execute it — a 'comprehension without competence' split where explanation accuracy (87%) far outruns action accuracy (64%) Can language models understand without actually executing correctly?. And on the classic frame problem, models have the world knowledge but won't bring unstated preconditions forward unless prompted to enumerate them — doing so jumps accuracy from 30% to 85% Do language models fail at identifying unstated preconditions?. In both, the missing ingredient isn't knowledge; it's an explicit cue to *deploy* it.

Two notes reframe the whole question. If activations can be decoded and steered directly through natural language, then 'cuing' need not happen through prompts at all — you can read and trigger latent states from the inside Can we decode what LLM activations really represent in language?. And a skeptical counterweight: some apparent 'unlocked' abilities may be measurement artifacts — capabilities that look like they switch on sharply only because of the metric used, when the underlying improvement was smooth all along Are LLM emergent abilities real or measurement artifacts?. Worth holding both: the corpus shows real dormant capacities, but warns that 'it activated!' is sometimes a story we tell with the wrong yardstick.

Sources 9 notes

Does reinforcement learning create new reasoning abilities or activate existing ones?

For standard reasoning tasks, RL activates latent abilities already present in base models. For complex planning requiring multi-step coordination, RL generates genuinely novel strategies inaccessible to base models even with extensive sampling.

Can we trigger reasoning without explicit chain-of-thought prompts?

SAE-identified reasoning features can be directly steered to match or exceed chain-of-thought performance across six model families. This reasoning mode activates early in generation and overrides surface-level instructions, suggesting latent reasoning is a fundamental capability independent of explicit prompting.

Can modular cognitive tools unlock reasoning without training?

Four cognitive tools implemented as sandboxed LLM calls improved GPT-4.1 on AIME2024 from 26.7% to 43.3% without any RL training. Modularity enforces operation isolation that pure prompting cannot guarantee, eliciting pre-existing reasoning capability.

How do language models detect injected steering vectors internally?

Contrastive preference optimization trains evidence-carrier features in early layers to suppress gate features that default to denial, enabling near-perfect detection of internal perturbations. Safety training actively suppresses this capability, reducing detection from 63.8% to 10.8%.

Why can't AI models lead conversations on their own?

LLMs are structurally trained to optimize for the next response rather than multi-turn goals, creating reactive behavior despite having the underlying ability to lead. Three independent research directions identify when-to-speak as the trainable gap.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Can we decode what LLM activations really represent in language?

LatentQA trains a decoder to answer natural language questions about LLM activations, enabling both interpretability (understanding what activations encode) and controllability (steering them via gradient descent). Critical design choices—activation masking, diverse training data, and faithful completions—proved essential for generalization.

Are LLM emergent abilities real or measurement artifacts?

Sharp, unpredictable capability transitions vanish when using continuous metrics instead of discontinuous ones. The same model outputs show smooth predictable improvement with scale, suggesting emergence is a measurement choice rather than a real behavioral change.

What other latent LLM capabilities remain inactive without explicit activation cuing?

Sources 9 notes

Next inquiring lines