Can activation sparsity patterns guide the selection of in-context learning demonstrations?
This explores whether the sparsity of a model's internal activations — how few or many neurons fire on a given input — can be used as a signal to pick or order the examples you put in a prompt for in-context learning (ICL).
This explores whether activation sparsity can guide which demonstrations you feed a model in-context — and the corpus has a surprisingly direct answer plus a web of supporting ideas around why it should work. The most on-the-nose result is a method that uses last-layer activation sparsity to order few-shot examples from sparse (treated as harder) to dense (treated as easier), getting solid gains with no external difficulty labels at all Can representation sparsity order few-shot demonstrations effectively?. So the literal answer is yes — sparsity isn't just a diagnostic, it's an actionable selection-and-ordering signal.
What makes this more than a one-off trick is *why* sparsity carries information about difficulty. Two notes supply the mechanism. Networks learn dense activations for inputs they've seen a lot and fall back to sparse activations for unfamiliar inputs Is representational sparsity learned or intrinsic to neural networks?, and that pattern intensifies under pressure: hidden states sparsify systematically as tasks get harder or drift out-of-distribution, acting as a selective filter rather than a breakdown Do language models sparsify their activations under difficult tasks?. Read together, these say sparsity is a learned, readable proxy for 'how unfamiliar/hard is this input to me' — which is exactly the latent variable a curriculum over demonstrations wants to sort on. The sparsity-guided method is essentially harvesting that signal.
But here's the twist a curious reader might not expect: *which* demonstrations you pick may matter less than how you arrange and frame them. One study shows that simply moving an identical demo block from the start of the prompt to the end can swing accuracy by up to 20% and flip nearly half the predictions — a spatial bias entirely independent of content How much does demo position alone affect in-context learning accuracy?. And for sequential or decision-making tasks, what the model needs isn't well-chosen isolated examples at all, but full same-environment trajectories — 'burstiness' — to generalize Why do trajectories matter more than individual examples for in-context learning?. So sparsity-based selection lives inside a larger design space where ordering and structure are co-equal levers.
There's also a cautionary thread worth pulling. Work on instruction tuning found models do nearly as well on semantically empty or wrong instructions as on correct ones — what transfers is knowledge of the output space, not the content Does instruction tuning teach task understanding or output format?. That raises a sharp question for any demonstration-selection scheme: are your sparsity-chosen examples teaching the task, or just steering format? The sparsity-as-difficulty story is compelling precisely because it claims to track genuine input familiarity rather than surface form — but it's the kind of claim that deserves the skeptical eye these format studies invite.
If you want the deeper rabbit hole, sparsity isn't only a selection signal — deliberately training *weights* to be sparse produces clean, human-interpretable circuits where neurons map to single concepts Can sparse weight training make neural networks interpretable by design?. That hints at a longer arc: the same property that lets you rank demonstrations by difficulty might eventually let you see *why* a given demonstration helps, turning prompt curation from an empirical art into something you can inspect.
Sources 7 notes
Sparsity-Guided Curriculum In-Context Learning uses last-layer activation sparsity to order demonstrations from sparse (harder) to dense (easier), yielding considerable performance improvements. This approach requires no external difficulty labels and works across diverse in-context learning tasks.
During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.
As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.
Repositioning an identical demo block from prompt start to end swaps up to 20% accuracy and flips nearly half of predictions. This spatial effect operates independently of demo content and spans multiple task types.
In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.
Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.
Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.