Reinforcement Learning for LLMs LLM Reasoning and Architecture Design & LLM Interaction

Can we train better models on less data?

Can gradient-based influence estimation identify which instruction data actually matters most? The research explores whether selecting small subsets of training data by their similarity to target capabilities might outperform training on everything.

Note · 2026-02-22 · sourced from Training Fine Tuning
How do you build domain expertise into general AI models? How should we allocate compute budget at inference time? How should researchers navigate LLM reasoning research?

LESS (Low-rank gradiEnt Similarity Search) selects instruction tuning data by estimating each example's influence on a target capability. Given a handful of examples embodying a specific skill (e.g., reasoning), LESS constructs a gradient datastore of low-dimensional features and selects training data whose gradient signatures are most similar to the target examples.

The headline result: training on LESS-selected 5% of the data can often outperform training on the full dataset across diverse downstream tasks. This is not just efficiency — it's a net improvement. The mechanism: mixed instruction tuning datasets contain examples that actively hinder specific capabilities. Since Does training data format shape reasoning strategy more than domain?, the wrong format examples can shift the model's reasoning strategy away from what the target task requires.

Three technical innovations make this practical for LLMs: (1) adaptation to the Adam optimizer (influence formulations traditionally assume SGD), (2) variable-length sequence handling (instruction data varies wildly in length, which derails standard gradient comparisons), and (3) low-rank gradient features that compress the storage and computation to feasible levels.

The transferability finding is striking: smaller models can select useful data for larger models, and models from different families can share data selections. This suggests the gradient-based quality signal captures something about the data's intrinsic fit with a capability — not just its fit with a particular model's current state. The qualitative analysis confirms this: LESS selects data that "goes beyond surface form cues to identify data that exemplifies the necessary reasoning skills."

This connects to the broader pattern that data quality dominates data quantity. Can models improve themselves on tasks without verifiable answers? showed 1000 well-chosen examples can catalyze general self-improvement. Does teacher-refined data always improve student model performance? showed that data needs to match the student. LESS provides the principled mechanism for finding that match.


Source: Training Fine Tuning

Related concepts in this collection

Concept map
15 direct connections · 125 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

gradient-based influence estimation identifies 5 percent of instruction data that outperforms training on the full dataset