Reinforcement Learning for LLMs LLM Reasoning and Architecture

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Explores whether applying alignment signals at inference time rather than modifying model weights can better preserve the factual knowledge learned during pretraining while still achieving alignment goals.

Note · 2026-02-22 · sourced from Training Fine Tuning

Proxy-tuning fine-tunes a small model, then applies the difference between the small tuned and small untuned model's predictions to shift a large untuned model's outputs at decoding time. The large model's parameters are never modified. The method closes 91% of the performance gap between Llama-2-13B and its directly tuned CHAT version, and 88% for the 70B model.

The critical finding: on knowledge-intensive tasks, proxy-tuning sometimes surpasses the performance of direct instruction-tuning. This is because direct fine-tuning modifies model weights — and some of those modifications overwrite pretrained knowledge. Since Why does reasoning training help math but hurt medical tasks?, weight modification risks corrupting the knowledge storage that proxy-tuning leaves intact.

Proxy-tuning primarily promotes reasoning and stylistic tokens. Analysis of the token-level distributional shift shows the largest influence on tokens associated with reasoning patterns and output style — consistent with evidence that "alignment mainly affects style rather than knowledge." This aligns with Does instruction tuning teach task understanding or output format? and Can imitating ChatGPT fool evaluators into thinking models improved?: what fine-tuning actually changes is output distribution, not capability. Proxy-tuning achieves this distributional change without touching the model weights that encode knowledge.

For domain adaptation, proxy-tuning Llama-2-13B using CodeLlama-7B produces 17-32% improvement on coding benchmarks. The small expert provides the distributional guidance; the large base model provides the knowledge. An optional hyperparameter controls the amount of guidance, enabling runtime trade-offs between different generation attributes.

This constitutes a fifth paradigm in the How do knowledge injection methods trade off flexibility and cost?: decoding-time adaptation. Zero training cost on the target model, full knowledge preservation, but requires access to base model logits at inference time.

ARGS (Alignment as Reward-Guided Search) provides a complementary inference-time method. Instead of applying a distributional shift from a tuned proxy, ARGS adjusts model predictions at each decoding step using a reward signal directly. Two components: reward-guided scoring (assigns scores to possible continuations) and token selection (selects a continuation based on scored candidates). A tunable weight controls the trade-off between semantic relevance and alignment criteria — setting it to zero recovers standard maximum-likelihood decoding. ARGS enables rapid personalized alignment without retraining: different users can have different reward functions applied at inference time. Together, proxy-tuning (distributional shift from expert delta) and ARGS (reward-guided decoding) suggest a design space where multiple axes of adaptation — domain knowledge, user preferences, task constraints — can each be applied at decoding time through complementary mechanisms. See Can user preferences be learned from just ten questions? for how per-user reward functions can be efficiently constructed.

Source: Training Fine Tuning

Related concepts in this collection

How do knowledge injection methods trade off flexibility and cost? When and how should domain knowledge enter an AI system? This explores the speed, training cost, and adaptability trade-offs across four injection paradigms, and when each approach suits different deployment constraints.
proxy-tuning adds a fifth paradigm: decoding-time adaptation
Why does reasoning training help math but hurt medical tasks? Explores whether reasoning and knowledge rely on different network mechanisms, and why training one might undermine the other across different domains.
explains why proxy-tuning preserves knowledge: it doesn't modify lower layers
Can imitating ChatGPT fool evaluators into thinking models improved? Explores whether fine-tuning weaker models on ChatGPT outputs creates an illusion of capability gains. Investigates why human raters and automated judges fail to detect that imitation improves style but not underlying factuality or reasoning.
proxy-tuning exploits the same style/knowledge distinction but productively
Does supervised fine-tuning actually improve reasoning quality? While SFT boosts final-answer accuracy, does it degrade the quality and informativeness of the reasoning steps that justify those answers? This matters for high-stakes domains requiring auditable decision-making.
proxy-tuning may avoid the SFT accuracy trap by not modifying weights
Can models precompute answers before users ask questions? Most LLM applications maintain persistent state across interactions. Could models use idle time between queries to precompute useful inferences about that context, reducing latency when users actually ask?
both are inference-time adaptation methods that avoid weight modification: proxy-tuning applies a distributional shift at decoding time, sleep-time compute pre-computes inferences between interactions; together they suggest a design space where adaptation, knowledge preservation, and latency optimization all operate at inference time rather than training time
Can models dynamically activate expert skills at inference time? Can language models efficiently discover and compose task-specific capabilities on the fly without modifying base weights? This explores whether test-time adaptation through expert vector composition outperforms fixed fine-tuning approaches.
complementary decoding-time adaptation: proxy-tuning applies a distributional shift from expert delta, SVF composes compact expert singular vectors via two-pass dispatch; both preserve base weights but SVF enables composable multi-skill adaptation while proxy-tuning uses a single expert signal

Concept map

17 direct connections · 187 in 2-hop network ·dense cluster

Can decoding-time tuning preserve knowledge bett… How do knowledge injection methods trade off flexi… Why does reasoning training help math but hurt med… Can imitating ChatGPT fool evaluators into thinkin… Does supervised fine-tuning actually improve reaso… Can models precompute answers before users ask que… Can models dynamically activate expert skills at i…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

proxy tuning at decoding time preserves pretrained knowledge better than direct fine-tuning by applying the tuning signal as a distributional shift