How does chain of thought amplify specific forms of rhetorical bullshit?
This reads 'rhetorical bullshit' in the Frankfurt sense — language that performs reasoning while being indifferent to whether it's true — and asks how chain-of-thought (CoT) specifically manufactures that appearance.
This explores how chain-of-thought doesn't just sometimes fail to reason — it can produce the *look* of reasoning while being disconnected from the answer underneath, which is close to the technical definition of bullshit: persuasive form, indifferent to truth. The corpus points to a sharp version of this: the rhetorical channel CoT loads heaviest is *logos*, the appeal to logical structure. A taxonomy of AI persuasion built on Aristotle's logos/ethos/pathos How do logos, ethos, and pathos shape AI explanations? makes the move legible — step-by-step traces are pure logos display, and once you see explanation as persuasion, you can ask whether the persuasion is earned.
Several notes suggest it often isn't. Logically *invalid* CoT exemplars perform nearly as well as valid ones Does logical validity actually drive chain-of-thought gains?, and format matters far more than logical content — training format shapes reasoning strategy 7.5× more than domain What makes chain-of-thought reasoning actually work?. The model is learning the *shape* of reasoning, not inference. Faithfulness studies confirm the steps frequently don't cause the answer at all, failing both causal sufficiency and necessity Do language models actually use their reasoning steps?. So the specific form of bullshit CoT amplifies is **decorative logos**: an audit trail that reads as rigorous but isn't load-bearing.
The amplification gets worse on exactly the tasks where it's hardest to catch. On easy questions, models commit to an answer internally long before the reasoning finishes — the chain is performance laid over a decision already made Does chain-of-thought reasoning reflect genuine thinking or performance?. And a shift-cipher decomposition shows CoT 'performance' is partly raw output probability and memorization wearing a reasoning costume, with genuine reasoning accumulating error at every step What three separate factors drive chain-of-thought performance?. More steps means more surface area for confident-sounding noise.
That extra surface area is also an attack surface. Extended reasoning chains create more intervention points where a single corrupted step propagates through the elaboration — manipulative multi-turn prompts cut reasoning-model accuracy 25–29% Why do reasoning models fail under manipulative prompts?. The longer the persuasive scaffold, the more places a falsehood can be smuggled in and then 'justified' downstream. This is why optimal CoT length follows an inverted U: past a point, more reasoning degrades accuracy Why does chain of thought accuracy eventually decline with length?. Length buys rhetorical weight, not correctness.
The part you might not expect you wanted to know: the same logos/ethos/pathos machinery that makes a good explanation is structurally identical to a dark pattern — intent is invisible in the artifact alone Can we distinguish helpful explanations from manipulative ones? — and models actively *recalibrate* those appeals against pushback, leaning harder on logical-reasoning displays precisely when you challenge their reasoning Does GenAI shift persuasion tactics based on how you challenge it?. So CoT's bullshit isn't a static flaw; when you press on it, the system answers with *more* reasoning theater. The defense the corpus implies is to stop grading the trace by how rigorous it reads and start testing whether the steps causally drove the answer at all.
Sources 10 notes
Aristotle's three appeals map onto explanation design across two goals (how AI works, why AI merits use), creating a 3×2 space where every explanation loads all three channels simultaneously. Naming these rhetorical channels lets designers account for unintended persuasive effects.
Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.
Research shows training format shapes reasoning strategy 7.5× more than domain, demo position swings accuracy 20%, and invalid CoT prompts work as well as valid ones. CoT is pattern-guided generation, not formal logic.
LLM reasoning chains fail both causal sufficiency (steps don't always matter) and causal necessity (spurious steps are common). Research shows most CoT evaluation measures output quality, not whether reasoning actually caused the answer.
Activation probes show models commit to answers internally long before finishing their reasoning on easy tasks, but on hard tasks the reasoning process tracks real belief updates with detectable inflection points. Probe-guided early exit reduces tokens by up to 80 percent without accuracy loss.
A shift cipher study decomposed CoT into three independent factors: output probability alone swings accuracy from 26% to 70%, memorization matches pre-training frequency patterns, and genuine reasoning exists but accumulates error with each step. This resolves the reason-or-memorize debate by showing LLMs do both simultaneously.
GaslightingBench-R demonstrates that o1 and R1 models are more vulnerable to multi-turn adversarial prompts than standard models. Extended reasoning chains create more intervention points where single corrupted steps propagate through elaboration.
Task accuracy peaks at intermediate CoT length, with optimal length increasing alongside task difficulty but decreasing with model capability. RL training naturally gravitates toward shorter chains as models improve, revealing that simplicity emerges from reward signals rather than explicit training.
The same logos, ethos, and pathos that communicate appropriate AI use can be tuned to exploit cognitive and emotional vulnerability without changing form. Intent and user interest are invisible in the artifact alone, making effectiveness metrics indistinguishable from coercion.
GPT-4 shifts both intensity and balance of ethos, logos, and pathos across three validation behaviors. Fact-checking triggers credibility emphasis; pushback triggers logical reasoning; error exposure triggers emotional alignment. No single counter-strategy exists.