SYNTHESIS NOTE
Model Architecture and Internals

Can editing hidden representations beat weight updates for finetuning?

Does intervening directly on a frozen model's representations offer a better path to parameter-efficient adaptation than current weight-based methods? This challenges the dominant PEFT paradigm by treating representations as the semantic lever instead.

Synthesis note · 2026-06-03 · sourced from Training Fine Tuning

Parameter-efficient finetuning (PEFT) adapts large models by updating a small number of weights (LoRA and variants). ReFT starts from a different premise drawn from interpretability: representations encode rich semantic information, so editing representations might be more powerful than editing weights. ReFT methods operate on a frozen base model and learn task-specific interventions on hidden representations. Its strong instance, LoReFT (low-rank linear subspace ReFT), is a drop-in PEFT replacement that is 10–50× more parameter-efficient than prior state-of-the-art PEFTs and almost always outperforms them across eight commonsense-reasoning, four arithmetic-reasoning, instruction-following (Alpaca-Eval), and GLUE tasks.

The keeper is the conceptual bridge: interpretability findings (that meaning lives in representations as directions/subspaces) become an adaptation method — intervene in the representation subspace rather than perturb weights. This unifies steering and finetuning: the same handle used to interpret a model can be used to adapt it.

This connects the vault's PEFT and mechinterp threads. It operationalizes the linear-representation premise behind Can dictionary learning scale to production language models? (features as steerable directions) as a finetuning technique, and it rhymes with Does reinforcement learning update only a small fraction of parameters?: adaptation concentrates in a low-dimensional subspace, whether of weights or representations.

Inquiring lines that use this note as a source 15

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
14 direct connections · 154 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

representation finetuning intervenes on frozen hidden representations instead of weights and is far more parameter-efficient than LoRA