SEAL: Self-Evolving Agentic Learning for Conversational Question Answering over Knowledge Graphs

Paper · arXiv 2512.04868 · Published December 4, 2025
Knowledge GraphsQuestion Answer SearchConversation Topics Dialog

Knowledge-based conversational question answering (KBCQA) confronts persistent challenges in resolving coreference, modeling contextual dependencies, and executing complex logical reasoning. Existing approaches, whether end-to-end semantic parsing or stepwise agent-based reasoning—often suffer from structural inaccuracies and prohibitive computational costs, particularly when processing intricate queries over large knowledge graphs. To address these limitations, we introduce SEAL, a novel two-stage semantic parsing framework grounded in self-evolving agentic learning. In the first stage, a large language model (LLM) extracts a minimal S-expression core that captures the essential semantics of the input query. This core is then refined by an agentic calibration module, which corrects syntactic inconsistencies and aligns entities and relations precisely with the underlying knowledge graph. The second stage employs template-based completion, guided by question-type prediction and placeholder instantiation, to construct a fully executable S-expression. This decomposition not only simplifies logical form generation but also significantly enhances structural fidelity and linking efficiency. Crucially, SEAL incorporates a self-evolving mechanism that integrates local and global memory with a reflection module, enabling continuous adaptation from dialog history and execution feedback without explicit retraining. Extensive experiments on the SPICE benchmark demonstrate that SEAL achieves state-of-the-art performance, especially in multi-hop reasoning, comparison, and aggregation tasks. The results validate notable gains in both structural accuracy and computational efficiency, underscoring the framework’s capacity for robust and scalable conversational reasoning.

A Knowledge Graph (KG) is a structured representation of knowledge, typically organized as triples (head entity, relation, tail entity) to encode factual information [1]. In recent years, KGs have gained widespread attention in both academia and industry [2, 3]. Knowledge-based Question Answering (KBQA) systems are designed to query these structured KGs, using reasoning to provide accurate answers to natural language questions [4, 5]. Among KBQA methods, Semantic Parsing (SP) based approaches translate questions into structured queries (e.g., SPARQL, Cypher, etc.) for execution against the KG, offering strong interpretability and high efficiency [6, 7]. These systems are widely applied in fields such as healthcare and business, significantly reducing the technical threshold for accessing complex knowledge systems. Knowledge-based conversational QA (KBCQA) extends this paradigm to multi-turn interactive scenarios, requiring the system to conduct continuous reasoning and to address dialog understanding challenges such as coreference resolution [8, 9]. For this task, SP remains a mainstream approach, where the goal is to convert contextual natural language queries into executable logical forms. With the emergence of large language models (LLMs) [10, 11], SP increasingly leverages their advanced language understanding capabilities [12, 13, 14], primarily through two paradigms: end-to-end logical form generation and agent-based stepwise construction.

While LLMs offer significant opportunities for SP-based KBQA, and KBCQA tasks, current methods face substantial limitations in handling structurally complex questions [15]. Specifically, generated logical forms often fail to fully capture semantic intent in scenarios requiring multi-hop reasoning, comparison, or aggregation operations [16, 17, 18]. This limitation is particularly evident in complex logical reasoning, where LLMs tend to focus on surface-level concepts while overlooking critical structural constraints imposed by the knowledge graph. Furthermore, the entity and relation linking process suffers from an expansive candidate space due to linguistic ambiguity [19, 20], leading to exponential growth in possible combinations and high computational overhead. This issue directly impacts reasoning generalization, LLMs often generate plausible but semantically invalid forms that ignore domain-specific validity constraints. These challenges hinder the scalability of SP-based KBQA systems, and are further exacerbated in the KBCQA setting, where the system must also manage dialog history to resolve coreferences and maintain contextual coherence. In particular, coreference resolution remains a major bottleneck that if but without aligning the resolved entity with its attributes in the knowledge graph , the final answer can still be inconsistent or incorrect.

In the first stage, the LLM generates a preliminary S-expression core, which is then semantically calibrated by an agent to correct structural errors. In the second stage, the LLM completes the logical structure by integrating the validated core with predefined templates, producing an accurate and executable S-expression. Crucially, SEAL incorporates a self-evolving mechanism that establishes a continuously learning agent through the synergy of local memory, global memory, and a reflection module. This mechanism enables the system to adaptively learn from successful past dialogs and execution outcomes, transforming global memory from static storage into a dynamically updateable knowledge base without explicit retraining. This approach effectively combines the semantic understanding of LLMs with the structural rigor of templates, improving the accuracy of complex query generation in KBCQA while maintaining high efficiency.

We propose a novel SP approach for KBCQA that uses LLMs to directly generate S-expressions. However, LLM outputs often contain ungrounded surface forms, and conventional entity and relation linking methods that rely on large candidate sets are computationally expensive. To enable efficient and accurate parsing, we introduce a lightweight calibration strategy that performs syntax correction and single-candidate KG alignment.

4.2. Reasoning Module

The core extraction phase, the initial critical step of the proposed method, focuses on deriving the S-expression core that encapsulates the essential semantics of natural language questions. This phase comprises two key steps:

• S-expression Core Generation: LLM analyzes the question text to identify independent query objects, employing five fundamental functions: JOIN, R, AND, VALUES, and IS_TRUE to articulate their logical relationships, thereby generating the S-expression core.

• S-expression Core Calibration: An agent interfacing with the knowledge graph refines the generated S-expression core by correcting syntactic errors and aligning entities and relations with the knowledge graph, yielding candidate variants.

The key innovation of this phase lies in decomposing the complex task of S-expression generation into independent substructure extractions, establishing a foundation for subsequent template integration, for details regarding specific expressions, refer to Appendix B. Moreover, experimental validation confirms that this phased approach substantially reduces model learning complexity and enhances generation accuracy.