Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning
There is a growing interest in applying pre-trained large language models (LLMs) to planning problems. However, methods that use LLMs directly as planners are currently impractical due to several factors, including limited correctness of plans, strong reliance on feedback from interactions with simulators or even the actual environment, and the inefficiency in utilizing human feedback. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners. To address the fact that LLMs may not generate a fully functional PDDL model initially, we employ LLMs as an interface between PDDL and sources of corrective feedback, such as PDDL validators and humans. For users who lack a background in PDDL, we show that LLMs can translate PDDL into natural language and effectively encode corrective feedback back to the underlying domain model. Our framework not only enjoys the correctness guarantee offered by the external planners but also reduces human involvement by allowing users to correct domain models at the beginning, rather than inspecting and correcting (through interactive prompting) every generated plan as in previous work.
Preliminary studies suggest that, in some everyday domains, LLMs are capable of suggesting sensible action plans [19, 1]. However, the correctness and executability of these plans are often limited. For instance, LLMs may regularly overlook the physical plausibility of actions in certain states and may not effectively handle long-term dependencies across multiple actions. Several approaches have been proposed to improve the planning capabilities of LLMs. One promising approach involves collecting feedback from the environment during plan execution and subsequently refining the plans.
To overcome these limitations, rather than using LLMs directly as planners, we advocate a model-based paradigm, wherein a PDDL world model is teased out of LLMs. We follow the identical problem setup as existing approaches, which involves providing the planner with a set of actions and their brief natural language descriptions. However, instead of directly mapping user commands to plans, we utilize LLMs to extract a symbolic representation of the actions in the form of PDDL action models. This intermediate output can be used with an external domain-independent planner to reliably search for feasible plans, or it can be used to validate and correct "heuristic" plans generated by an LLM planner. Additionally, our modular method essentially divides the planning process into two distinct parts, namely modeling the causal dependencies of actions and determining the appropriate sequence of actions to accomplish the goals. LLMs, which have been trained on extensive web-scale knowledge, exhibit greater proficiency in the former task rather than the latter.