Reinforcement Learning for LLMs LLM Reasoning and Architecture

Can models dynamically activate expert skills at inference time?

Can language models efficiently discover and compose task-specific capabilities on the fly without modifying base weights? This explores whether test-time adaptation through expert vector composition outperforms fixed fine-tuning approaches.

Note · 2026-02-23 · sourced from Novel Architectures

Transformer2 introduces Singular Value Fine-tuning (SVF): instead of modifying full weight matrices or even low-rank adaptations, SVF extracts and tunes only the singular values within a model's weight matrices. This produces compact expert vectors that are inherently composable — they can be dynamically mixed at inference without interference.

The inference mechanism has two passes:

  1. First pass (dispatch): The model executes on the input and observes its own test-time behavior, gathering information about what skills the current problem requires.
  2. Second pass (adaptation): The framework combines available expert vectors based on the first-pass analysis, providing a targeted modification to the base weights specifically tailored to the task.

Three adaptation strategies provide monotonic performance benefits with increasing access to test-time conditions, enabling deployment-scenario-appropriate tradeoffs.

The key properties that make this work:

The neuroscience parallel is deliberate: the brain activates specific regions depending on the task and dynamically reconfigures its functional networks in response to changing demands. Transformer2 operationalizes this for LLMs.

The deeper principle: the requisite capabilities for many downstream tasks already exist within pretrained models. The bottleneck is not knowledge but activation — knowing when to deploy which capability. This aligns with Does RL teach reasoning or just when to use it?, extending it to the architecture level: self-adaptation is about routing to existing capabilities, not creating new ones.


Source: Novel Architectures

Related concepts in this collection

Concept map
14 direct connections · 152 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

self-adaptive LLMs compose expert vectors at inference via two-pass singular value fine-tuning