Me, Myself, and AI: The Situational Awareness Dataset (SAD) for LLMs

Paper · arXiv 2407.04694 · Published July 5, 2024
Philosophy Subjectivity

AI assistants such as ChatGPT are trained to respond to users by saying, “I am a large language model”. This raises questions. Do such models know that they are LLMs and reliably act on this knowledge? Are they aware of their current circumstances, such as being deployed to the public? We refer to a model’s knowledge of itself and its circumstances as situational awareness. To quantify situational awareness in LLMs, we introduce a range of behavioral tests, based on question answering and instruction following. These tests form the Situational Awareness Dataset (SAD), a benchmark comprising 7 task categories and over 13,000 questions. The benchmark tests numerous abilities, including the capacity of LLMs to (i) recognize their own generated text, (ii) predict their own behavior, (iii) determine whether a prompt is from internal evaluation or real-world deployment, and (iv) follow instructions that depend on self-knowledge.