Language Models are Pragmatic Speakers
“We propose a generalization of the previous methods called bounded pragmatic speakers with a dual model of thought. A dual model of thought comprises of a slow-thinking system for deep reasoning and a fast-thinking system for fast inference. Similar to a BPS, the slow-thinking system employs a modular design so that it can be easily scrutinized and upgraded. The fast-thinking system is a compact language model that is fast and cheap to run. Given a task, the speaker runs both systems in a budgeted amount of time and prioritize using the answer computed by the slow-thinking system if it finishes in time. Otherwise, the speaker resorts to the answer suggested by the fast-thinking system. Occasionally, the slow-thinking system distills its knowledge to the fast-thinking system through an approximate inference algorithm.
There are various choices for the slow-thinking system: a probabilistic model (Griffiths et al., 2010), a modular neural network (Corona et al., 2020), a tree search algorithm (Anthony et al., 2017), a causal graph (Geiger et al., 2021), a program (Wang et al., 2023), or a language model prompted to reason and construct plans (Wei et al., 2022b; Ahn et al., 2022). The inference algorithm can be imitation learning, reinforcement learning, an advanced decoding algorithm (Lu et al., 2021), or a learning algorithm that enables learning from rich feedback (Nguyen et al., 2021).”