Can humans suppress frequency bias through attention and intention?
This reads 'frequency bias' as the pull toward common, repeated, or context-prominent material — and asks whether deliberately directing attention (and forming an intention to override it) can suppress it, in models and in the humans using them.
This explores whether frequency bias — the tendency to over-weight what's common or repeated — can be undone by paying deliberate attention rather than letting the default ride. The first thing the corpus suggests is that this bias isn't a quirk you can simply will away: it's baked in at two levels. Architecturally, transformer soft attention structurally over-weights repeated and context-prominent tokens regardless of whether they're relevant, forming a feedback loop that amplifies whatever's already prominent Does transformer attention architecture inherently favor repeated content?. And developmentally, the broader family of cognitive biases gets planted during pretraining and only nudged by later finetuning, not installed by it Where do cognitive biases in language models come from?. So the bias has deep roots — but that same first note offers the most direct answer to your question: 'System 2 Attention,' which regenerates the context to strip out irrelevant material before reasoning, can interrupt the mechanism. That is, almost literally, suppressing frequency bias through a deliberate act of attention.
Why this matters becomes vivid when you see what frequency bias does downstream. Because general words (hypernyms) occur more often than specific ones (hyponyms), a model that prefers the more frequent paraphrase quietly drifts toward abstraction — erasing exactly the expert-level specificity a careful thinker would want to keep Does word frequency correlate with semantic abstraction?. The bias isn't neutral; it sands away detail. So 'intention' here isn't abstract — it's the choice to resist the slide toward the bland and common.
The corpus offers several other 'deliberate attention' levers beyond regenerating context. You can train invariance directly: consistency training teaches a model to respond the same way to a clean prompt and a cluttered one, using its own clean answers as the target — effectively teaching it to ignore the irrelevant prominent stuff Can models learn to ignore irrelevant prompt changes?. You can force reasoning before judgment: LLM judges trained to actually think through an evaluation become far less susceptible to surface features like verbosity, position, and authority — biases that are cousins of frequency bias Can reasoning during evaluation reduce judgment bias in LLM judges?. And you can ground attention in the world rather than in the prior: interleaving reasoning with real tool queries injects external feedback that overrides the model's internal pull Can interleaving reasoning with real-world feedback prevent hallucination?. All three are versions of the same move — inserting a deliberate step between the default impulse and the output.
But here's the turn you might not expect, and the reason the word 'humans' in your question is load-bearing. The corpus warns that human attention is itself the weak point. When people work with AI, three cognitive traps — confusing the model's map for the territory, mistaking fluent intuition for reasoning, and confirmation bias — compound rather than add, and they exploit the same prominence-favoring instincts Why do people trust AI outputs they shouldn't?. So intention alone is fragile; the more reliable fix is structural. 'Learning to Guide' shows that when machines supply interpretive guidance instead of handing over an answer, anchoring bias drops and human judgment actually improves — because the design keeps the human's attention engaged rather than deferring Can AI guidance reduce anchoring bias better than AI decisions?.
So the honest answer is: yes, but not by willpower alone. Frequency bias can be interrupted — by regenerating what you attend to, by training for invariance, by forcing a reasoning step, by grounding in external signal, and by designing interactions that keep humans in the loop rather than anchored. The thing you might not have known you wanted to know: the most effective 'intention' isn't a private mental effort to resist the common, it's an external scaffold — a deliberate step engineered into the process so the bias has to be passed through rather than simply followed.
Sources 8 notes
Transformer soft attention systematically over-weights repeated and context-prominent tokens regardless of relevance, creating a positive feedback loop that amplifies opinions and framing before RLHF acts. System 2 Attention—regenerating context to remove irrelevant material—can interrupt this mechanism.
A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.
WordNet analysis shows hypernyms (general concepts) occur more frequently than hyponyms (specific ones). Combined with LLMs' frequency bias, this means preferring common paraphrases systematically drifts toward abstraction, erasing expert-level specificity.
Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.
Training judges with reinforcement learning to reason about evaluations—by converting judgment tasks into verifiable problems with synthetic data pairs—produces judges that think through their decisions rather than relying on exploitable surface features, directly mitigating authority, verbosity, position, and beauty bias.
ReAct demonstrates that alternating verbal reasoning with external tool queries (Wikipedia API, environment interaction) prevents error propagation by injecting real-world feedback at each step. On knowledge-intensive and interactive tasks, this approach outperforms pure chain-of-thought and reinforcement learning by 10-34% absolute accuracy.
Rose-Frame identifies map-territory confusion, intuition-reason conflation, and confirmation-bias reinforcement as traps that multiply their distorting effects when they co-occur. Evidence from cross-linguistic overreliance and architectural transformer biases confirms the compounding mechanism operates universally.
Learning to Guide eliminates anchoring bias and unassisted hard cases by having machines supply interpretive guidance rather than autonomous decisions, keeping responsibility with humans while improving their judgment through enhanced perception.