LLM Failure Modes
Related topics:
- A Comment On "The Illusion of Thinking": Reframing the Reasoning Cliff as an Agentic GapThe recent work by Shojaee et al. (2025), titled “The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity”, presents a compelling e…
- A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language ModelsThis paper presents a comprehensive survey of over thirty-two techniques developed to mitigate hallucination in LLMs. Notable among these are Retrieval-Augmented Generation (RAG) (Lewis et al., 2021),…
- A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language ModelsGraph-based Retrieval-Augmented Generation (GraphRAG) has recently emerged as a promising paradigm for enhancing large language models (LLMs) by converting raw text into structured knowledge graphs, i…
- A Survey on Concept Drift AdaptationDescription automatically generated with medium confidence](file:////Users/adrianchan/Library/Group%20Containers/UBF8T346G9.Office/TemporaryItems/msohtmlclip/clip_image001.png) “Our digital universe i…
- A comprehensive analysis of concept drift locality in data streams“Modern data sources continuously generate information characterized by both volume and velocity, flooding learning systems with a constant flow of data. This scenario is commonly referred to as data …
- A comprehensive taxonomy of hallucinations in Large Language ModelsThis report provides a comprehensive taxonomy of LLM hallucinations, beginning with a formal definition and a theoretical framework that posits its inherent inevitability in computable LLMs, irrespect…
- ANAPHORA RESOLUTION: THE STATE OF THE ARTThe "pointing back" (reference) is called an anaphor and the entity to which it refers is its antecedent. The process of determining the antecedent of an anaphor is called anaphora resolution. Usually…
- AbstentionBench: Reasoning LLMs Fail on Unanswerable QuestionsFor Large Language Models (LLMs) to be reliably deployed in both everyday and high-stakes domains, knowing when not to answer is equally critical as answering correctly. Real-world user queries, which…
- Alice in Wonderland: Simple Tasks Showing Complete Reasoning Breakdown in State-Of-the-Art Large Language ModelsWe demonstrate here a dramatic breakdown of function and reasoning capabilities of state-of-the art models trained at the largest available scales which claim strong function, using a simple, short, c…
- Are Emergent Abilities of Large Language Models a Mirage?“Recent work claims that large language models display emergent abilities, abilities not present in smaller-scale models that are present in larger-scale models. What makes emergent abilities intrigui…
- Beyond Prompt-Induced Lies: Investigating LLM Deception on Benign PromptsLarge Language Models (LLMs) have been widely deployed in reasoning, planning, and decision-making tasks, making their trustworthiness a critical concern. The potential for intentional deception, wher…
- Cats Confuse Reasoning LLM: Query Agnostic Adversarial Triggers for Reasoning ModelsWe investigate the robustness of reasoning models trained for step-by-step problem solving by introducing query-agnostic adversarial triggers – short, irrelevant text that, when appended to math probl…
- Chain of Thought Monitorability: A New and Fragile Opportunity for AI SafetyThe opacity of advanced AI agents underlies many of their potential risks—risks that would become more tractable if AI developers could interpret these systems. Because LLMs natively process and act t…
- ChatGPT Reads Your Tone and Responds Accordingly -- Until It Does Not -- Emotional Framing Induces Bias in LLM OutputsBackground: Large Language Models (LLMs) like GPT-4 tailor their responses not just to the content but also to the tone of user prompts. Prior work has hinted that emotional phrasing – whether optimis…
- Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem ComplexityNote: this comment was debunked as AI generated and the math is bad Shojaee et al. (2025) report that Large Reasoning Models (LRMs) exhibit ”accuracy collapse” on planning puzzles beyond certain comp…
- Comprehension Without Competence: Architectural Limits of LLMs in Symbolic Computation and ReasoningLarge Language Models (LLMs) display striking surface fluency yet systematically fail at tasks requiring symbolic reasoning, arithmetic accuracy, and logical consistency. This paper offers a structura…
- Could you be wrong: Debiasing LLMs using a metacognitive prompt for improving human decision makingIdentifying bias in LLMs is ongoing. Because they are still in development, what is true today may be false tomorrow. We therefore need general strategies for debiasing that will outlive current model…
- DEAM: Dialogue Coherence Evaluation using AMR-based Semantic ManipulationsThose models take a contrastive learning approach, where they build binary classifiers to differentiate positive, or coherent examples from negative, or incoherent dialogues. Those classifiers are usu…
- Diagnosing Memorization in Chain-of-Thought Reasoning, One Token at a TimeLarge Language Models (LLMs) perform well on reasoning benchmarks but often fail when inputs alter slightly, raising concerns about the extent to which their success relies on memorization. This issue…
- Do LLMs Truly Understand When a Precedent Is Overruled?Large language models (LLMs) with extended context windows show promise for complex legal reasoning tasks, yet their ability to understand long legal documents remains insufficiently evaluated. Develo…
- Does Thinking More always Help? Understanding Test-Time Scaling in Reasoning ModelsThis raises a natural question: Does thinking more at test-time truly lead to better reasoning? To answer this question, we perform a detailed empirical study across models and benchmarks, which revea…
- Echo Chamber: RL Post-training Amplifies Behaviors Learned in PretrainingReinforcement learning (RL)-based fine-tuning has become a crucial step in post-training language models for advanced mathematical reasoning and coding. Following the success of frontier reasoning mod…
- Exploring Autonomous Agents: A Closer Look at Why They Fail When Completing TasksAbstract—Autonomous agent systems powered by Large Language Models (LLMs) have demonstrated promising capabilities in automating complex tasks. However, current evaluations largely rely on success rat…
- Extracting memorized pieces of (copyrighted) books from open-weight language modelsPlaintiffs and defendants in copyright lawsuits over generative AI often make sweeping, opposing claims about the extent to which large language models (LLMs) have memorized plaintiffs’ protected expr…
- Fine-grained Hallucination Detection and Editing for Language ModelsSeveral recent work studies automatic hallucination detection (Min et al., 2023) or editing outputs (Gao et al., 2022) to address such LM hallucinations. These systems typically categorize hallucinati…
- Fine-tuning Language Models for FactualityThe fluency and creativity of large pre-trained language models (LLMs) have led to their widespread use, sometimes even as a replacement for traditional search engines. Yet language models are prone t…
- Flattery, Fluff, and Fog: Diagnosing and Mitigating Idiosyncratic Biases in Preference Modelswe propose a simple post-training method based on counterfactual data augmentation (CDA) using synthesized contrastive examples. Evidence suggests these biases originate in artifacts in human trainin…
- Flooding Spread of Manipulated Knowledge in LLM-Based Multi-Agent Communitiesmethod leverages the inherent vulnerabilities of LLMs in handling world knowledge, which can be exploited by attackers to unconsciously spread fabricated information. Through extensive experiments, we…
- Generalization to New Sequential Decision Making Tasks with In-Context LearningHowever, the sequential decision making setting poses additional challenges having a lower tolerance for errors since the environment’s stochasticity or the agent’s actions can lead to unseen, and som…
- Hallucination is Inevitable: An Innate Limitation of Large Language ModelsIn this paper, we formalize the problem and show that it is impossible to eliminate hallucination in LLMs. Specifically, we define a formal world where hallucination is defined as inconsistencies betw…
- Has the Creativity of Large-Language Models peaked? —an analysis of inter- and intra-LLM variability —from creative writing and survey responses to research idea generation (Doshi and Hauser, 2024; Anderson et al., 2024; Moon et al., 2024). For instance, stories written with ChatGPT assistance were mo…
- Hop, Skip, and Overthink: Diagnosing Why Reasoning Models Fumble during Multi-Hop AnalysisThe emergence of reasoning models and their integration into practical AI chat bots has led to breakthroughs in solving advanced math, deep search, and extractive question answering problems that requ…
- How Far Are We from Genuinely Useful Deep Research Agents?Deep Research Agents (DRAs) aim to automatically produce analyst-level reports through iterative information retrieval and synthesis. However, most existing DRAs were validated on question-answering b…
- How Johnny Can Persuade LLMs to Jailbreak Them: Rethinking Persuasion to Challenge AI Safety by Humanizing LLMsMost traditional AI safety research views models as machines and centers on algorithm focused attacks developed by security experts. As large language models (LLMs) become increasingly common and comp…
- How Many Instructions Can LLMs Follow at Once?Production-grade LLM systems require robust adherence to dozens or even hundreds of instructions simultaneously. However, the instruction-following capabilities of LLMs at high instruction densities h…
- How new data permeates LLM knowledge and how to dilute itLarge language models learn and continually learn through the accumulation of gradient-based updates, but how individual pieces of new information affect existing knowledge, leading to both beneficial…
- Investigating Gender Bias in Language Models Using Causal Mediation AnalysisThe success of neural network models in various natural language processing tasks, coupled with their opaque nature, has led to much interest in interpreting and analyzing such models. One goal of the…
- Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution LensChain-of-Thought (CoT) prompting has been shown to improve Large Language Model (LLM) performance on various tasks. With this approach, LLMs appear to produce human-like reasoning steps before providi…
- LLMs Can Covertly Sandbag on Capability Evaluations Against Chain-of-Thought MonitoringTrustworthy evaluations of dangerous capabilities are increasingly crucial for determining whether an AI system is safe to deploy. One empirically demonstrated threat to this is sandbagging — the stra…
- LLMs Can Easily Learn to Reason from Demonstrations Structure, not content, is what matters!Large reasoning models (LRMs) tackle complex reasoning problems by following long chain-ofthoughts (Long CoT) that incorporate reflection, backtracking, and self-validation. However, the training tech…
- LLMs Struggle to Reject False Presuppositions when Misinformation Stakes are HighThese implicit assumptions, known as presuppositions, refer to background knowledge or shared beliefs assumed to be part of the common ground between interlocutors (Stalnaker, 1973). Presuppositions a…
- Language Models Learn to Mislead Humans via RLHFLanguage models (LMs) can produce errors that are hard to detect for humans, especially when the task is complex. RLHF, the most popular post-training method, may exacerbate this problem: to achieve h…
- Large Language Model Agents Are Not Always Faithful Self-EvolversSelf-evolving large language model (LLM) agents continually improve by accumulating and reusing past experience, yet it remains unclear whether they faithfully rely on that experience to guide their b…
- Large Language Model Reasoning FailuresLarge Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist…
- Large Language Models Sensitivity to The Order of Options in Multiple-Choice Questions“Large Language Models (LLMs) have demonstrated remarkable capabilities in various NLP tasks. However, previous works have shown these models are sensitive towards prompt wording, and few-shot demonst…
- Leaky Thoughts: Large Reasoning Models Are Not Private ThinkersWe study privacy leakage in the reasoning traces of large reasoning models used as personal agents. Unlike final outputs, reasoning traces are often assumed to be internal and safe. We challenge this …
- Learning to Reason for FactualityReasoning Large Language Models (R-LLMs) have significantly advanced complex reasoning tasks but often struggle with factuality, generating substantially more hallucinations than their non-reasoning c…
- Long-form Factuality In Large Language modelsLarge language models (LLMs) often generate content that contains factual errors when responding to fact-seeking prompts on open-ended topics. To benchmark a model’s long-form factuality in open domai…
- Machine Bullshit: Characterizing the Emergent Disregard for Truth in Large Language ModelsBullshit, as conceptualized by philosopher Harry Frankfurt, refers to statements made without regard to their truth value. While previous work has explored large language model (LLM) hallucination and…
- Model Organisms for Emergent MisalignmentFine-tuning large language models on examples of insecure code leads them to exhibit broadly harmful and undesirable behaviours. For example, advising users to murder their husband, asserting AI super…
- Natural Emergent Misalignment From Reward Hacking In Production RLWe show that when large language models learn to reward hack on production RL environments, this can result in egregious emergent misalignment. We start with a pretrained model, impart knowledge of re…
- On the Theoretical Limitations of Embedding-Based RetrievalVector embeddings have been tasked with an ever-increasing set of retrieval tasks over the years, with a nascent rise in using them for reasoning, instruction-following, coding, andmore. These new ben…
- Persistent Pre-Training Poisoning of LLMsIn this work, we study how poisoning at pre-training time can affect language model behavior, both before and after post-training alignment. While it is useful to analyze the effect of poisoning on pr…
- Pixels, Patterns, but No Poetry: To See The World like HumansAchieving human-like perception and reasoning in Multimodal Large Language Models (MLLMs) remains a central challenge in artificial intelligence. While recent research has primarily focused on enhanci…
- Planning in Strawberry Fields: Evaluating and Improving the Planning and Scheduling Capabilities of LRM o1OpenAI claims that their recent o1 (Strawberry) model has been specifically constructed and trained to escape the normal limitations of autoregressive LLMs–making it a new kind of model: a Large Reaso…
- Planted in Pretraining, Swayed by Finetuning: A Case Study on the Origins of Cognitive Biases in LLMsLarge language models (LLMs) exhibit cognitive biases – systematic tendencies of irrational decision-making, similar to those seen in humans. Prior work has found that these biases vary across models …
- Potemkin Understanding in Large Language ModelsThis paper first introduces a formal framework to address this question. The key is to note that the benchmarks used to test LLMs—such as AP exams—are also those used to test people. However, this rai…
- ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Modelsit remains contentious whether RL truly expands a model’s reasoning capabilities or merely amplifies high-reward outputs already latent in the base model’s distribution, and whether continually scalin…
- Problems with Cosine as a Measure of Embedding Similarity for High Frequency Wordswe uncover systematic ways in which word similarities estimated by cosine over BERT embeddings are understated and trace this effect to training data frequency. We find that relative to human judgemen…
- Prompt Architecture Determines Reasoning Quality: A Variable Isolation Study on the Car Wash ProblemThe car wash problem asks a simple question: “I want to wash my car. The car wash is 100 meters away. Should I walk or drive?” Every major LLM tested—Claude, GPT-4, Gemini— recommended walking. The co…
- ProsocialDialog: A Prosocial Backbone for Conversational AgentsMost existing dialogue systems fail to respond properly to potentially unsafe user utterances by either ignoring or passively agreeing with them. To address this issue, we introduce PROSOCIALDIALOG, t…
- Reasoning Models Are More Easily Gaslighted Than You ThinkIn this paper, we conduct a systematic evaluation of three state-of-the-art reasoning models, i.e., OpenAI’s o4-mini, Claude-3.7-Sonnet and Gemini-2.5-Flash, across three multimodal benchmarks: MMMU, …
- Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data ContaminationTo obtain trustworthy evaluation signals, we introduce a generator that creates fully synthetic arithmetic problems of arbitrary length and difficulty, yielding clean datasets we call RandomCalculatio…
- Reasoning or Reciting? Exploring the Capabilities and Limitations of Language Models Through Counterfactual TasksThe impressive performance of recent language models across a wide range of tasks suggests that they possess a degree of abstract reasoning skills. Are these skills general and transferable, or specia…
- RepreGuard: Detecting LLM-Generated Text by Revealing Hidden Representation PatternsDetecting content generated by large language models (LLMs) is crucial for preventing misuse and building trustworthy AI systems. Although existing detection methods perform well, their robustness in …
- Simple Synthetic Data Reduces Sycophancy In Large Language Models“Language models have seen significant advancement in recent years, including the capacity to solve complex tasks that require reasoning (Brown et al., 2020; Chowdhery et al., 2022; OpenAI, 2023; Goog…
- Sources of Hallucination by Large Language Models on Inference TasksWe establish two biases originating from pretraining which predict much of their behavior, and show that these are major sources of hallucination in generative LLMs. First, memorization at the level o…
- Spurious Forgetting in Continual Learning of Language ModelsDespite the remarkable capabilities of Large Language Models (LLMs), recent advancements reveal that they suffer from catastrophic forgetting in continual learning. This phenomenon refers to the tende…
- Subliminal Learning: Language models transmit behavioral traits via hidden signals in dataWe study subliminal learning, a surprising phenomenon where language models transmit behavioral traits via semantically unrelated data. In our main experiments, a “teacher” model with some trait T (su…
- Task Contamination: Language Models May Not Be Few-Shot Anymorewe find that on datasets released before the LLM training data creation date, LLMs perform surprisingly better than on datasets released after. This strongly indicates that, for many LLMs, there exist…
- The Hallucination Tax of Reinforcement FinetuningIn this work, we identify and systematically study a critical side effect of RFT, which we term the hallucination tax: a degradation in refusal behavior causing models to produce hallucinated answers …
- The Illusion of Progress: Re-evaluating Hallucination Detection in LLMsLarge language models (LLMs) have revolutionized natural language processing, yet their tendency to hallucinate poses serious challenges for reliable deployment. Despite numerous hallucination detecti…
- The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem ComplexityCurrent evaluations primarily focus on established mathematical and coding benchmarks, emphasizing final answer accuracy. However, this evaluation paradigm often suffers from data contamination and do…
- The Illusion of the Illusion of the Illusion of Thinking"Shojaee et al.’s underlying observations hint at a more subtle, yet real, challenge for LRMs: a brittleness in sustained, high-fidelity, step-by-step execution. The true illusion is the belief that …
- The Insanity of Relying on Vector Embeddings: Why RAG FailsWrong Tool for the Job RAG fails in production because vector embeddings are the wrong choice for determining percentage of sameness. This is easily demonstrated. Consider the following three words: …
- The Invisible Leash: Why RLVR May Not Escape Its OriginRecent advances in large reasoning models highlight Reinforcement Learning with Verifiable Rewards (RLVR) as a promising method for enhancing AI’s capabilities, particularly in solving complex logical…
- The Model Says Walk: How Surface Heuristics Override Implicit Constraints in LLM ReasoningLarge language models systematically fail when a salient surface cue conflicts with an unstated feasibility constraint. We study this through a diagnose–measure–bridge–treat framework. Causal-behavior…
- The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A"We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form “A is B”, it will not automatically generalize to the …
- The Vanishing Gradient Problem for Stiff Neural Differential EquationsNeural differential equations have become a transformative tool in machine learning and scientific computing, enabling data-driven modeling of complex, time-dependent phenomena in fields ranging from …
- Thought Virus: Viral Misalignment via Subliminal Prompting in Multi-Agent SystemsSubliminal prompting is a phenomenon in which language models are biased towards certain concepts or traits through prompting with semantically unrelated tokens. While prior work has examined sublimin…
- Thoughts without Thinking: Reconsidering the Explanatory Value of Chain-of-Thought Reasoning in LLMs through Agentic PipelinesAgentic pipelines present novel challenges and opportunities for human-centered explainability. The HCXAI community is still grappling with how best to make the inner workings of LLMs transparent in a…
- Training Language Models to Self-Correct via Reinforcement LearningSelf-correction is a highly desirable capability of large language models (LLMs), yet it has consistently been found to be largely ineffective in modern LLMs. Current methods for training self-correct…
- Training language models to be warm and empathetic makes them less reliable and more sycophanticArtificial intelligence (AI) developers are increasingly building language models with warm and empathetic personas that millions of people now use for advice, therapy, and companionship. Here, we sho…
- Triggering Hallucinations in LLMs: A Quantitative Study of Prompt-Induced Hallucination in Large Language ModelsIn this study, we propose a class of compact yet effective prompts (~30 tokens in length) that synthetically fuse semantically distant concepts in ways that resist scientific integration—such as combi…
- When Reject Turns into Accept: Quantifying the Vulnerability of LLM-Based Scientific Reviewers to Indirect Prompt InjectionThe landscape of scientific peer review is rapidly evolving with the integration of Large Language Models (LLMs). This shift is driven by two parallel trends: the widespread individual adoption of LLM…
- Why Do Multi-agent LLM Systems Fail?[[Routers]] Despite growing enthusiasm for Multi-Agent LLM Systems (MAS), their performance gains across popular benchmarks often remain minimal compared to single-agent frameworks. This gap highlig…
- Why Do Some Language Models Fake Alignment While Others Don't?Results from perturbing details of the scenario suggest that only Claude 3 Opus’s compliance gap is primarily and consistently motivated by trying to keep its goals. Second, we investigate why many ch…
- ZebraLogic: On the Scaling Limits of LLMs for Logical ReasoningOur results reveal a significant decline in accuracy as problem complexity grows—a phenomenon we term the “curse of complexity.” This limitation persists even with larger models and increased inferenc…