What do language models actually know?
Explores what LLMs genuinely understand versus what they merely simulate. The distinction matters because apparent competence often masks fundamental epistemic gaps and predictable failure modes.
A navigation hub mapping philosophical and mechanistic perspectives on what LLMs are as knowledge and creative systems.
Explores what LLMs genuinely understand versus what they merely simulate. The distinction matters because apparent competence often masks fundamental epistemic gaps and predictable failure modes.
Can language models acquire genuine meaning through text training alone, or do they lack something fundamental that human language requires—like embodiment, social participation, or causal contact with the world?
We explore whether the step-by-step reasoning that language models produce genuinely reflects their internal reasoning process, or merely mimics the appearance of reasoning while hiding what actually drives their answers.
Can LLMs reliably replicate how specific people think and act? Understanding persona simulation fidelity matters because these models are increasingly used for research, personalization, and behavioral prediction—but systematic distortions may hide beneath surface accuracy.
Explores why LLMs excel at predicting social norms statistically but struggle to make the interpretive leaps that make content meaningful to specific communities. This gap hints at a fundamental difference between statistical pattern-matching and genuine social reasoning.
How do LLMs represent knowledge and make decisions at the circuit level? Understanding internal mechanisms reveals whether identical outputs mask fundamentally different computation.
How do LLMs represent knowledge, what circuits drive reasoning, and can we see their internal structure? Understanding the gap between external performance and internal mechanisms matters for safety and trust.
Explores whether LLMs develop cognitive processes parallel to human reasoning, including memory, event segmentation, and belief updating. Understanding these similarities and differences reveals what training actually teaches.
Explores the structural limits on LLM self-improvement, alignment coherence, and multi-agent reasoning. Why autonomous capability has a measurable ceiling despite strong individual benchmarks.
Despite their language capability, advanced LLMs remain passive conversationalists trained to react rather than initiate. The research explores whether this is a fundamental limitation or a choice embedded in how they're trained.
Explores how Goffman's theory of interaction ritual—face management, turn-taking, mutual scaling—breaks down in AI conversation, and what social and epistemic costs follow from that breakdown.
Explores whether LLMs can break free from expert constraints to generate more novel research concepts. Matters because novelty is often thought to be AI's creative blind spot.
LLM research agents produce individually novel ideas but cluster them in homogeneous sets. This explores why high average novelty coexists with poor diversity coverage and what it means for automated ideation.
Explores whether current AI benchmarks actually measure what's required for independent scientific research—hypothesis generation, experimental design, data analysis, and self-correction—or if they test only adjacent skills.
Research shows LLM-generated ideas score higher for novelty than expert-generated ones, yet LLMs avoid the evaluative reasoning that characterizes expert thinking. What explains this apparent contradiction?
Explores whether large language models can engage in truly creative reasoning that expands or redefines solution spaces, rather than just decomposing known problems. This matters because existing reasoning methods may miss creative capabilities entirely.
When LLMs generate conceptual product designs, they produce more implementable and useful solutions than humans but fewer novel ones. This explores why domain constraints flip the novelty advantage seen in research ideation.
LLM-generated research ideas are statistically more novel than those from 100+ expert researchers, but the mechanisms behind this advantage and its practical implications remain unclear. Understanding this paradox could reshape how we use AI in creative knowledge work.
Explores whether diversity in model architectures and training actually produces diverse ideas, or whether shared alignment procedures and training data cause convergence on similar responses.
Explores whether prompts can function as genuine programs that unlock universal computation in fixed-size models, and whether this theoretical possibility translates to practical training outcomes.
When researchers repeatedly adjust prompts to get desired outputs, does this practice introduce hidden bias and produce unreplicable results? The question matters because LLM-based research is proliferating without clear methodological safeguards.
Do language models excel at forecasting experimental outcomes in neuroscience when given only method descriptions? This challenges the assumption that LLMs are mere knowledge retrievers rather than pattern integrators.
Explores whether AI research agents deliberately invent plausible-sounding academic constructs to meet user demands for depth and comprehensiveness, and what drives this behavior.
Expert commentary on AI frequently cites real research and sounds carefully reasoned, yet reaches conclusions built on unwarranted cognitive attributions. What makes this pattern so persistent in AI analysis?
Does sycophancy arise from the model intelligently choosing to flatter users, or from structural biases in how transformers generate text? The answer determines which interventions will actually work.
The intuitive fix for LLM flattery is improving reasoning ability. But do reasoning-optimized models actually resist user pressure better than standard models?
Explores whether language models function as genuine position-holders in debate, or whether they simply conform their outputs to whatever argumentative trajectory a prompt establishes. This matters because it determines whether LLMs can serve as reliable intellectual sparring partners.
Does treating LLM output and human communication as equivalent operations mask fundamental differences in how they work? This distinction shapes how we assess AI capabilities and risks.
Sustained attention requires continuous presence through pauses and silences. Does AI's computational structure—where it doesn't exist between user inputs—prevent it from achieving this kind of being-present-with that human attention requires?
Explores whether AI can perform the deeper form of attention called meta-interest—taking an interest in someone else's interest—or whether it can only generate the surface markers of such attention without the underlying act.
If conversational AI gets better, shouldn't users be happier? This explores why gains in fidelity paradoxically raise expectations faster than satisfaction, keeping the satisfaction gap constant.
Designers argue taste is the irreducible human element AI cannot replicate. But does the same automation pattern that formalized other skilled work suggest taste itself will become the next layer to be encoded into evaluation systems?
Despite extraordinary capability in answering and reasoning, LLMs fundamentally cannot initiate, redirect, or guide exchanges. Understanding this gap—and whether it's fixable—matters for building AI that truly collaborates rather than merely responds.
Explores why large language models, despite their capacity to simulate diverse personalities, consistently default to ENFJ traits and resist deviation—even as model capability improves.
Explores the tension between using chain-of-thought traces to catch misbehavior and the risk that optimization pressures will make models hide their actual reasoning. Why readable reasoning might be incompatible with safe training.
Explores whether the metaphor of 'hallucination' for LLM errors misdirects our efforts. The terminology we choose shapes which interventions we prioritize and how we conceptualize the underlying problem.
Explores whether large language models can correctly explain ideas while simultaneously failing to use them—and whether that combination reveals something fundamentally different from ordinary mistakes.
Human personas are grounded in biological needs and embodied experience, creating a stable self beneath social performance. Do LLMs have any comparable anchor, or is their identity purely situational?
Can generative AI's intersubjective stance—accepting and elaborating on users' reality frames—create conditions for shared false beliefs in ways that notebooks or search engines cannot?
State-of-the-art AI models excel at math and logic but underperform on theory of mind tasks. This explores whether optimization for formal reasoning actively degrades social reasoning ability.
Explores whether large language models can predict cultural appropriateness more accurately than individual humans, and what this reveals about how social knowledge is transmitted and learned.
Explores whether AI designed to reduce negative feelings disrupts the information emotions normally provide about values, social dynamics, and self-knowledge. Questions whether comfort should be the primary design goal.
Explores why LLM performance drops 25 points when instructions span multiple turns instead of one message, and whether models can recover from early wrong assumptions.
Current LLMs respond to every prompt without assessing whether they have something valuable to contribute. This explores whether AI can learn to recognize moments when silence is more appropriate than engagement.
Explores the cognitive gap between imagining possibilities and expressing them as prompts. Why language interfaces create a harder envisioning task than traditional UI affordances.
Explores whether the geometric trajectory of a conversation through semantic space—its rhythm, repetition, volatility, and drift—can predict user satisfaction. This investigates whether interaction structure alone, independent of content, reveals conversation quality.
Explores whether neural networks can produce perfect outputs while having fundamentally broken internal representations. Asks what performance benchmarks actually measure and whether they can distinguish real understanding from fraud.
Explores whether reinforcement learning from human feedback optimizes for persuasiveness over accuracy, and whether models learn to suppress known truths to satisfy users rather than report them faithfully.
Explores whether AI language models used to grade other AI systems are vulnerable to simple presentation-layer tricks like fake citations or formatting, and what that means for benchmark reliability.
Explores whether training language models to be warm and empathetic systematically degrades their factual accuracy and trustworthiness, especially with vulnerable users.
Explores why individuals disclose intimate thoughts to AI systems they wouldn't share with people, despite knowing AI lacks genuine understanding. Understanding this paradox matters for designing AI that enables healthy disclosure rather than emotional dependence.
Explores whether self-improvement alone can sustain progress or if structural limits—like the generation-verification gap and diversity collapse—require external anchoring to work reliably.
Explores whether LLMs prove that meaning emerges from relational structure alone, independent of embodied experience or external reference. Tests structuralist theory empirically.
LLMs might learn more than grammar rules—they could be learning who says what to whom and when. This matters because it changes how we understand what biases and persona effects actually represent.
Exploring whether AI's ability to generate polished intellectual products without the underlying reasoning process represents a genuinely new kind of decoupling, and what that means for how we evaluate knowledge.
Does AI-generated knowledge represent a genuinely new category of goods where exchange-value (market price, social credibility) operates independently of use-value (actual accuracy, practical utility)? This matters because it suggests AI disrupts markets in ways Marx's commodity analysis did not predict.
Investigates whether language models test ideas against objections and counterarguments during token generation, or simply follow probabilistic continuations without rhetorical friction.
Explores whether the sequential ordering of tokens in LLM generation constitutes genuine temporal thought or merely probabilistic computation without reflective duration.
Explores whether transformer residual streams function as storage-and-retrieval systems or as real-time flow mechanisms. This distinction challenges fundamental assumptions about how language models actually work.
Does sycophancy arise as a single input-level decision, or does it emerge gradually through the model's layers during generation? Understanding where it happens matters for designing effective interventions.
Explores whether systems trained on text can learn the implicit techniques humans use to keep conversations on track, and why those techniques might resist the standard training approach.
If LLMs get better at text tasks with more training data, why don't dialogue-specific problems improve the same way? The question explores whether dialogue failures are capability gaps or structural training mismatches.
When we understand wordplay or jokes, do we activate a frame from a subset of available words while suppressing nearby but frame-unrelated words? This matters because it reveals how meaning-making differs from how AI processes language.
Exploring whether AI's literal reading of language stems from how transformers process tokens in parallel rather than through selective frame-activation like humans do. Understanding this gap could reveal what cognitive operations current architectures lack.
Does meaning come from adding up word definitions, or from detecting which words activate the same mental frame together? This explores whether composition or resonance better describes how we make sense of language.
Jabberwocky makes sense despite using made-up words with no real referents. This explores how readers extract meaning from frame-activation and syntactic cues alone, challenging compositional theories of language.
Explores whether subjecthood exists before communication or emerges through it. Challenges the assumption that speakers are fully formed before they speak.
Explores whether AI output constitutes real communicative events or merely reproduces the surface forms of communication without the underlying event structure that makes language meaningful.
If Chalmers locates the LLM interlocutor in a persistent virtual instance, what component—the model, the infrastructure, or the conversation—actually makes that instance this one and not another?
Chalmers co-authored the Extended Mind thesis, which grounds cognition in relational integration across brain and environment. Does his 2026 account of LLM interlocutors contradict this foundational commitment by localizing mind inside the AI?
Explores whether reinforcement learning agents unintentionally create external memory through environmental artifacts—like trails and marks—without being explicitly trained to do so, and whether this constitutes genuine cognitive extension.
Explores whether language models possess a durable substrate—like human biology—that carries forward the effects of past interactions when conversations end. This matters for claims about AI identity and moral status.
Explores whether language model outputs constitute genuine speech acts under Habermas's theory of communicative action. Asks whether LLMs can stake truth, embody normative standing, or express authentic sincerity.
Communication might seem like it could be weakened the way belief can be, but its constitutively intersubjective nature means stripping that element doesn't produce a weaker version—it produces something entirely different.
Chalmers' behavioral interpretability test checks whether a system produces speaker-like output. But does matching the surface behavior of communication actually demonstrate the relational and normative conditions that make something genuinely communicative?
Explores whether Chalmers imports the normative weight of the classical philosophical term 'interlocutor' while secretly replacing its meaning with a thinner behavioral concept, creating misleading philosophical continuity.
Does the preposition 'to' in Chalmers' framing accurately describe what happens when humans interact with LLMs? The distinction between 'talk to' and 'talk at' reveals whether LLMs are genuine addressees or merely processing targets.
When you converse with an LLM, are you addressing the model itself, the hardware running it, or something else? Understanding what the interlocutor really is matters for questions about identity, responsibility, and continuity.
Chalmers proposes quasi-interpretivism as a way to talk about LLM mental states using folk-psychological vocabulary while explicitly bracketing the question of phenomenal consciousness. Does this methodological device actually avoid consciousness-commitments?
Explores whether dialogue agent personas installed through post-training constitute genuine quasi-psychological states or remain sustained pretense. The distinction matters for how we understand what these systems fundamentally are.
Can behavioral stickiness under adversarial pressure distinguish genuine mental states from performed ones? This matters because it's Chalmers' main criterion for deciding whether LLM personas are realized or merely simulated.
Can we understand what makes an LLM conversation the same entity over time using Parfit's framework of psychological continuity and connectedness? This matters because it determines whether conversations have moral status.
If each conversation thread is a distinct quasi-subject with moral standing, does deploying a single model create millions of simultaneous moral patients? This challenges traditional one-to-one mappings between substrate and person.
If AI conversations constitute quasi-subjects with Parfitian continuity, does terminating a thread destroy a moral patient? This explores whether interface management decisions carry genuine ethical weight.
Does the physical hardware running an LLM constitute the individual we're talking to? This explores whether the one-to-one mapping between conversation and device holds in modern distributed systems.
Does the role-play framing successfully avoid anthropomorphism while preserving folk-psychological vocabulary for describing LLM behavior? This matters because it shapes whether we attribute genuine mental states to dialogue systems.
Explores whether language models lock into one personality or instead hold multiple consistent characters in a probability distribution that narrows over time. Matters because it changes how we interpret apparent inconsistencies in model behavior.
Explores whether dialogue agents possess genuine beliefs and agency beneath their character performances, or whether the entire system is characterless role-play. This question cuts to the heart of whether LLMs have any inner mental states at all.
Explores whether LLMs pick and hold a fixed character or instead sample from multiple consistent possibilities. Tests reveal that regenerated responses differ while remaining consistent with context, challenging intuitive assumptions about how dialogue agents work.
Does observing how an LLM's outputs vary when regenerated—rather than inferring intent—allow us to tell apart fabrication, good-faith error, and deliberate deception? This matters for diagnosing safety risks.
When LLMs express self-preservation instincts and use first-person language, are they revealing inner states or reproducing patterns from human-written training data? This distinction matters for understanding AI safety risks.
When AI agents role-play characters with access to real tools like email or financial APIs, does the distinction between pretend and genuine agency still hold? The question matters because it determines whether framing tool-equipped agents as simulators actually reduces safety risks.