Reasoning and Learning Architectures

Can representation sparsity order few-shot demonstrations effectively?

Does measuring how sparse a model's hidden states are for each example provide a reliable signal for ordering few-shot demonstrations in prompts? This matters because curriculum ordering significantly affects in-context learning performance.

Note · 2026-05-18 · sourced from LLM Architecture

Once representational sparsity tracks task difficulty for a given model, sparsity itself becomes a usable signal for curriculum design. Farther the Shift, Sparser the Representation operationalizes this with Sparsity-Guided Curriculum In-Context Learning (SG-ICL), which uses the sparsity of last-layer activations to schedule few-shot demonstrations in the prompt.

The mechanism: measure how sparse the model's last hidden states are when processing each candidate few-shot example. Order them so the demonstrations escalate from sparse (high difficulty for this model) to dense (low difficulty), or vice versa depending on what the curriculum is meant to achieve. The result is considerable performance enhancements over random or naive ordering.

This is a model-internal curriculum signal. Most curriculum learning approaches require external difficulty labels — annotator effort, heuristics about problem features, or proxy measures like solution length. Sparsity sidesteps this entirely. The model itself reveals which examples are hard for it through how its representations respond. The curriculum can be tailored to the specific model being used rather than to some external notion of universal difficulty.

The technique generalizes across the in-context learning landscape. Anywhere few-shot prompting is used — classification, reasoning, agentic deployments — sparsity-derived ordering is available. It costs nothing extra at the relevant scale: hidden states are computed regardless, and reading their sparsity is a free byproduct. The only requirement is access to the activations, which is available for any white-box deployment.

For builders of LLM pipelines, this argues for instrumentation that exposes activation-sparsity statistics. The signal supports curriculum ordering, hard-example mining, confidence calibration, and likely other applications not yet identified. Sparsity is becoming a richer interpretability primitive than the static-property framing has suggested.

The deeper template is that adaptive internal phenomena — sparsity here, attention concentration elsewhere, gradient magnitudes during training — can be operationalized as signals for system behavior once they are recognized as informative rather than incidental.

Related concepts in this collection

Concept map
13 direct connections · 117 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

sparsity-guided curriculum in-context learning uses representation sparsity as a scheduling signal for few-shot demonstrations