JointLK: Joint Reasoning with Language Models and Knowledge Graphs for Commonsense Question Answering

Paper · arXiv 2112.02732 · Published December 6, 2021

“An extensive research path is to elaborately design graph neural networks (GNNs) (Scarselli et al., 2008) to perform reasoning over explicit structural common sense knowledge from external knowledge bases (Vrandeˇci´c and Krötzsch, 2014; Speer et al., 2017). Related methods usually follow a retrieval-and-modeling paradigm. First, the knowledge subgraphs or paths related to a given question are retrieved by string matching or semantic similarity; such retrieved structured information indicates the relation between concepts or implies the process of multi-hop reasoning. Second, the retrieved subgraphs are modeled by a well-designed graph neural network module (Lin et al., 2019; Feng et al., 2020; Yasunaga et al., 2021) to perform reasoning over knowledge graphs.

However, these approaches have two main issues. First, the retrieved knowledge subgraph contains many noisy nodes. Whether through simple string matching or semantic matching, in order to retrieve sufficient relevant knowledge, noise knowledge graph nodes will inevitably be included (Lin et al., 2019; Yasunaga et al., 2021). Especially with the increase of hop count, the number of irrelevant nodes will expand dramatically, raising the burden of the model. As the example in Figure 1, some graph nodes such as “wood", “burn", and “gas", although related to some entities in the questions and choice, can mislead the global understanding of the question. Second, there are limited interactions between language representation and knowledge graph representation. Specifically, existing LM+KG methods (Lin et al., 2019; Feng et al., 2020) model question context and knowledge subgraphs in isolation by LMs and GNNs, and perform only one interaction in a shallow manner to fuse their representations at the output for prediction. We argue that the limited interaction between the two modalities is the main bottleneck that may prevent the model from understanding the complex question-knowledge relations necessary to answer the question correctly.

Based on the above consideration, we propose JointLK, a model that performs the fine-grained modal fusion and multi-layer joint reasoning between the language model and the knowledge graph (see Figure 2). Specifically, given a question and retrieved subgraphs, JointLK first obtain the representations of the two modalities by using an LM encoder and a GNN encoder respectively. Then we design a joint reasoning module to generate fine-grained bidirectional attention maps between each question token and each KG node to fuse the information from each modality to the other. Guided by the attention generated in the interaction process, the dynamic pruning module deletes irrelevant nodes to make the model reason along the correct knowledge path. Multiple JointLK layers are stacked to form a hierarchy that supports multi-step interactions and recursive pruning.”