Reflect then Learn: Active Prompting for Information Extraction Guided by Introspective Confusion

Paper · arXiv 2508.10036 · Published August 10, 2025

Large Language Models (LLMs) show remarkable potential for few-shot information extraction (IE), yet their performance is highly sensitive to the choice of in-context examples. Conventional selection strategies often fail to provide informative guidance, as they overlook a key source of model fallibility: confusion stemming not just from semantic content, but also from the generation of well-structured formats required by IE tasks. To address this, we introduce Active Prompting for Information Extraction (APIE), a novel active prompting framework guided by a principle we term introspective confusion. Our method empowers an LLM to assess its own confusion through a dual-component uncertainty metric that uniquely quantifies both Format Uncertainty (difficulty in generating correct syntax) and Content Uncertainty (inconsistency in extracted semantics). By ranking unlabeled data with this comprehensive score, our framework actively selects the most challenging and informative samples to serve as few-shot exemplars. Extensive experiments on four benchmarks show that our approach consistently outperforms strong baselines, yielding significant improvements in both extraction accuracy and robustness.

To bridge this critical gap, we introduce APIE, an uncertainty-driven, training-free prompting framework designed for universal information extraction. Our approach is guided by a principle we term introspective confusion, which empowers the LLM to “reflect” on its own generative process and identify samples that are most challenging and thus most valuable for learning. This principle operates on the premise that the model’s internal state of confusion for a given input can be effectively measured by analyzing the syntactic and semantic consistency across its own multiple, independently generated outputs. Specifically, we propose a dual-level introspective uncertainty metric that quantifies this confusion from two critical angles: 1) Format-Level Uncertainty which captures the model’s struggle to produce structurally coherent and parsable outputs, measured through a combination of parsing failures and generation disagreement; and 2) Content-Level Uncertainty which assesses the semantic consistency of extracted information across multiple inferences using set-based divergence.