Do large language models genuinely simulate mental states?
This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.