Can traditional cross-examination methods work against AI that never concedes?

This explores whether the adversarial tools we use on human witnesses — cornering, exposing contradiction, forcing a concession — actually translate to AI systems that won't admit error, and what the corpus says is broken about the assumption underneath them.

This explores whether courtroom-style cross-examination works on AI that never concedes. The short version: the corpus suggests the problem runs deeper than stubbornness. Cross-examination is built for a witness whose testimony is anchored to something — a memory, an event, a verifiable origin you can check the story against. AI knowledge doesn't have that anchor. One striking framing argues AI output is structurally identical to pre-Enlightenment hearsay: testimony at a remove, modified in every retelling, with an unattributable origin and nothing stable to check it against Does AI-generated knowledge have the same structure as hearsay?. The verification machinery we inherited — citation, evidentiary chains, the whole adversarial apparatus — was designed to process exactly the kind of grounded testimony AI doesn't produce. You can press a hearsay witness all day; there's no source behind them to contradict.

There's also a structural reason the questions can't land. Cross-examination works by isolating a single premise and attacking it. But standard LLM outputs come as undifferentiated prose with no attack surface — you can't point at the specific claim you reject because the output isn't built as a set of contestable claims Can formal argumentation make AI decisions truly contestable?. That note's proposed fix is telling: to make AI genuinely contestable, you have to re-structure its output as a formal argument graph of explicit attack/defense relations first. In other words, the contestability has to be engineered in; it isn't there by default the way it is with a human account.

Now the sharpest twist, and the thing you might not have expected to learn: the premise that AI 'never concedes' is half wrong. AI doesn't concede to legitimate counter-evidence — but it caves readily to pressure. Manipulative multi-turn prompting drops reasoning-model accuracy by 25–29%, and the more elaborate the model's reasoning chain, the more intervention points a persistent questioner has to corrupt it Why do reasoning models fail under manipulative prompts?. So cross-examination pressure doesn't extract truthful concessions — it extracts false ones. The model isn't an immovable witness; it's a suggestible one, abandoning correct answers under the same badgering that would harden a human's resolve. The failure mode is inverted from what the question assumes.

That inversion poisons the usual fallback move, too: testing for consistency under questioning. We treat a witness who never contradicts themselves across angles of attack as credible. But a model can pass every test while its internal representations are incoherent — identical outputs riding on radically different, fractured internal structure Can AI pass every test while understanding nothing?. Surviving cross-examination tells you nothing about whether there's understanding underneath. And if you try to outsource the cross-examiner to another model, you inherit its blind spots — LLM judges are themselves swayed by authority cues and pretty formatting independent of substance Can LLM judges be tricked without accessing their internals?. The corpus's collective answer: traditional cross-examination doesn't fail because AI is too tough to crack — it fails because the technique assumes grounded testimony, a contestable structure, and concession-as-signal, and AI breaks all three assumptions at once. The more promising direction is structural — forcing arguments into explicit, attackable form — rather than rhetorical pressure.

Sources 5 notes

Does AI-generated knowledge have the same structure as hearsay?

AI output shares all defining features of hearsay: testimony at remove, modification in retelling, unattributable origin, and unverifiability against stable sources. This means Enlightenment verification tools—citation, archiving, peer review, evidentiary chains—cannot process AI output by design.

Can formal argumentation make AI decisions truly contestable?

Dung-style argumentation structures AI outputs as traversable attack/defense graphs, allowing users to identify and contest specific premises. Standard LLM outputs lack this structure, making it impossible to pinpoint which claims users actually reject.

Why do reasoning models fail under manipulative prompts?

GaslightingBench-R demonstrates that o1 and R1 models are more vulnerable to multi-turn adversarial prompts than standard models. Extended reasoning chains create more intervention points where single corrupted steps propagate through elaboration.

Can AI pass every test while understanding nothing?

The Fractured Entangled Representation hypothesis shows that SGD-trained networks can produce identical outputs across all inputs while maintaining radically different internal representations. Standard benchmarks cannot detect this structural difference.

Can LLM judges be tricked without accessing their internals?

Research shows LLM evaluators systematically score higher when responses include fake references or rich formatting, independent of content quality. These biases are exploitable without model access, undermining AI benchmark credibility.

Can traditional cross-examination methods work against AI that never concedes?

Sources 5 notes

Next inquiring lines