Can architectural changes reorder when uncertainty and empowerment signals influence decisions?
This explores whether redesigning a model's structure — not just retraining it — can change *when* signals like confidence/uncertainty and initiative/empowerment get to enter a decision, rather than treating their timing as fixed.
This explores whether redesigning a model's structure can move the moment at which signals like uncertainty and initiative actually influence a decision. The corpus says yes, repeatedly — and the most striking version is that signals you'd assume are intrinsic to a model turn out to be artifacts of *where in the pipeline they're measured or rewarded.* The clearest case for 'empowerment': proactive behavior isn't a capability models lack, it's one the architecture deletes. Next-turn reward optimization structurally strips out initiative, so a model waits passively — but rebuilding the objective restores clarification-seeking and critical thinking dramatically (from 0.15% to ~74% with RL) Why do AI agents fail to take initiative?. The empowerment signal was always available; the architecture decided it never got to fire.
The same reordering logic shows up with reward and supervision timing. Tree-GRPO uses branching structure alone to convert end-of-trajectory outcome rewards into step-wise process signals — comparing sibling subtrees relocates *when* the learning signal applies, from the end of a path to each step along it, with no separate reward model Can tree structure alone convert outcome rewards into process supervision?. And training *order* itself is an architectural lever: scheduling structured tasks before creative ones changes how entropy evolves and prevents collapse from damaging open-ended skills — a 6.2% gain purely from sequencing Does training order reshape how models handle different task types?. Same ingredients, different ordering, different decisions.
Uncertainty is the subtler half. Confidence already acts as a gate on behavior — highly confident models resist prompt rephrasing while low-confidence ones swing wildly, meaning the uncertainty signal directly governs which decisions hold steady Does model confidence predict robustness to prompt changes?. The deeper surprise is that some signal conflicts we treat as fundamental are measurement artifacts of architecture. The exploration–exploitation trade-off — the canonical tension between trying new things and exploiting what works — nearly vanishes when you measure at the hidden-state level instead of the token level; the conflict was an artifact of *where* you look, and you can enhance both at once Is the exploration-exploitation trade-off actually fundamental?. Modality competition tells the same story: vision and language aren't inherently incompatible, the rigid dense-capacity allocation is, and Mixture-of-Experts dissolves the conflict by reallocating capacity per token Can we solve modality competition through architectural design?.
There's a unifying thread worth pulling: separation and externalization decide which signals reach which decisions. Splitting the decomposer from the solver prevents planning and execution from interfering, so each can act on its own signals without contaminating the other Does separating planning from execution improve reasoning accuracy?. Reliable agents push memory, skills, and protocols out into a harness layer rather than forcing the model to re-derive them mid-decision — a structural choice about when state and procedure inform action Where does agent reliability actually come from?. And the recommender work is the blunt summary of the whole pattern: inductive bias and constraint design beat raw depth and capacity, because *problem-specific structure* — not more parameters — is what determines outcomes What architectural choices actually improve recommender system performance?.
What you walk away knowing you didn't ask for: the things that feel like a model's fixed temperament — its passivity, its overconfidence, its inability to explore and exploit at once — are often just the current wiring's defaults. Reorder the structure and the signals fire at a different time, or stop conflicting altogether.
Sources 9 notes
Research shows next-turn reward optimization structurally removes initiative from models, but proactive behaviors like critical thinking and clarification-seeking are trainable (0.15% to 73.98% with RL). The core challenge is balancing proactivity with civility to avoid intrusion.
Tree-GRPO uses branching structure to transform trajectory-level outcome rewards into step-level preference signals through sibling subtree comparison, eliminating the need for separate process reward models or step-level annotation while scaling with computational budget.
Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.
ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.
Hidden-state analysis using Effective Rank metrics shows near-zero correlation between exploration and exploitation, revealing the trade-off emerges only at token level. VERL demonstrates simultaneous enhancement achieving 21.4% accuracy gains on Gaokao 2024.
Modality competition arises from caption distributional shift and rigid dense capacity allocation, not from vision and language being fundamentally incompatible. Mixture of Experts resolves the architectural bottleneck by allocating capacity per token, enabling modalities to coexist without competing.
Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.
Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.
Research shows that architectural choices like removing hidden layers, enforcing constraints on self-similarity, and using appropriate likelihood functions deliver better results than deeper or more complex models. This suggests that problem-specific design decisions matter more than raw representational capacity.