Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving

Paper · arXiv 2405.12205 · Published May 20, 2024

Metacognitive knowledge refers to humans’ intuitive knowledge of their own thinking and reasoning processes. Today’s best LLMs clearly possess some reasoning processes. The paper gives evidence that they also have metacognitive knowledge, including ability to name skills and procedures to apply given a task. We explore this primarily in context of math reasoning, developing a prompt-guided interaction procedure to get a powerful LLM to assign sensible skill labels to math questions, followed by having it perform semantic clustering to obtain coarser families of skill labels. These coarse skill labels look interpretable to humans.

core concept in human pedagogy is Metacognition [18], sometimes described as thinking about thinking. It refers to the ability to reason about one’s own cognitive processes as well as about learning-relevant properties of information or data. Metacognitive Knowledge refers to the learner’s accumulated knowledge of this type. Pedagogy research shows that improving learners’ metacognitive knowledge can improve their capabilities, for example on math [19, 20]. The current paper raises the question “Do LLMs also have metacognitive knowledge?” And if yes, “Can we bootstrap such knowledge to further improve LLM capabilities?”

Our automated approach for the discovery of skills utilizes state-of-the-art LLMs to identify their own catalog of math skills and then organize datasets using that catalog. Stage 1 of our methodology involves instructing the powerful LLM to assign skill labels to each example within a given dataset. Usually this results in fine-grained skills, and too many skill labels to be useful. In Stage 2, the same LLM is asked to perform semantic clustering on the labeled data, grouping examples by the similarity of their underlying skills (as perceived by the LLM). Each resulting cluster represents a more coarse-grained skill that is applicable to a larger set of examples. Our method retains only these coarse skills as well as their LLM-assigned label. (To give an example, for the MATH dataset, Stage 1 identified approximately 5000 skills, which Stage 2 reduced to 117 coarse skills.) A random subset of examples representing the coarse skills are retained as its skill exemplars. (See Figure 1 and Appendix 8).