GPT-4 is judged more human than humans in displaced and inverted Turing tests

Paper · arXiv 2407.08853 · Published July 11, 2024

In many cases, people will not interact directly with AI systems but instead read conversations between AI systems and other people. We measured how well people and large language models can discriminate using two modified versions of the Turing test: inverted and displaced.

both AI and displaced human judges were less accurate than interactive interrogators, with below chance accuracy overall.

Recent empirical work has found that interrogators could not reliably determine whether a GPT-4-based agent was human or AI in a Turing test (Jones and Bergen, 2024). Models that can successfully impersonate people bring attendant risks. This motivates conducting variations of the Turing test in more ecologically-valid settings to determine how effective people are in discriminating between humans and AIs in realistic scenarios. An ordinary Turing test provides the interrogator with a key advantage not always present in passive consumption of AI-generated text: they can adapt their questions to adversarially test the witness in real time. Here, we ask how well human and AI judges perform without this advantage, when they only have access to a transcript of a Turing test interview conducted by a separate participant.

The first of these variations is the inverted, or reverse Turing test, which places an AI system in the role of the interrogator. Watt (1996) proposed the inverted test as a measure of naive psychology, the innate tendency to recognize intelligence similar to our own and attribute it to other minds. It is passed if an AI system is "unable to distinguish between two humans, or between a human and a machine that can pass the normal Turing test, but which can discriminate between a human and a machine that can be told apart by a normal Turing test with a human observer,” (Watt, 1996, p. 8). Watt argued that by placing AI in the observer role and comparing its accuracy for different witnesses with human accuracy, the AI would reveal whether it has naïve psychology comparable to humans.

The growing role of AI agents in online interactions raises questions around how well these systems will be able to discriminate between between human and AI-generated content, and what kinds of criteria they might use to do so.

First, these tasks can only be considered a “static” version of the test, as the judgement is based on pre-existing and unchanging content generated fully by a human or an AI. Second, while an interactive interrogator in a traditional Turing test can ask dynamic, flexible, and adversarial questions, the judge in a static Turing test can only consider what an agent happened to say, and cannot interact to pursue the most fruitful lines of questioning.

Here, we introduce a novel kind of static Turing test called a displaced Turing test, wherein a human judge reads a transcript of an interactive Turing test that was previously conducted by a different human interrogator. The new human judge is “displaced” in that they are not present to interact with the witness.