Probing Structured Semantics Understanding and Generation of Language Models via Question Answering

Paper · arXiv 2401.05777 · Published January 11, 2024

As John McCarthy (McCarthy, 1990, 1959) points out, in order to a better understanding of natural language, it is necessary for an intelligence system to understand the “deep structure” (Chomsky, 2011) of the sentence, which can be explicitly defined in a human-designed formal language. Therefore, we propose to probe the deep understanding of natural language in LLMs with formal languages, which serves to ascertain the boundaries of semantic comprehension exhibited by LLMs and point ways for improving the understanding and generation ability in LLMs.

In addition, choosing QA as our probing task brings two advantages: (1) Convenience. There exists many different formal languages, also called logical forms, constructed for knowledge based question answering, which can be directly employed in our experiment. (2) Simplicity of evaluating. To avoid the heavy human evaluation of the semantic correctness of generated text, we can leverage the accuracy of the answers for indirect evaluation. In this work, we define two sub-tasks of knowledge-based question answering as the probing task: 1) Formal Language Understanding, which aims to automatically translate a piece of logical form (LF) into its corresponding natural language question (NLQ). The translation process can be considered as the model interpreting the provided LFs in NLQ, thus demonstrating LLMs’ understanding ability of formal language; 2) Formal Language Generation, which aims to correctly convert a NLQ into its corresponding LF, requiring the model to not only understand but also generate LFs, demonstrating its capability in generation.

Our findings indicate that there is still a gap between LLMs and human in terms of structured semantics understanding. Coinciding with our intuition, the generation capability of LLMs for structured semantics is much weaker than their understanding ability. Importantly, we observe that models exhibit the sensitivity to different logical forms. Overall, the lower level of formalization, that is the closer it is to natural language, the easier it is for models to understand and generate. These findings suggest the feasibility of employing LLMs combining with knowledge bases to tackle the complex reasoning that currently still pose a challenge to LLMs (Bang et al., 2023).

In this work, we leverage the formal language to probe the deep structure understanding of natural language in LLMs. Our observations suggest that there still exists a gap between LLMs and human. Aligning with our intuition, the ability of LLMs to generate structured semantics is notably inferior to their ability to understand it. Beyond these basic conclusions, we also discover that the choice of formal language and knowledge base also exerts significant influence on models’ performance.

In our experiment, models performing on KoPL yields the best results on nearly all experiments. We believe it is because KoPL employs expressions that are more similar to natural language while preserving the structure and modularity. However, SPARQL and Lambda DCS face challenges in grounding entities to the knowledge base for their level of formalization is too high. As a result, KoPL proves to be the most LLMs-friendly among the formal languages that we investigate in this work.