Evolving Deeper LLM Thinking

Paper · arXiv 2501.09891 · Published January 17, 2025
Novel ArchitecturesEvolutionDeep ResearchAgents Multi Architecture

We explore an evolutionary search strategy for scaling inference time compute in Large Language Models. The proposed approach, Mind Evolution, uses a language model to generate, recombine and refine candidate responses. The proposed approach avoids the need to formalize the underlying inference problem whenever a solution evaluator is available. Controlling for inference cost, we find that Mind Evolution significantly outperforms other inference strategies such as Best-of-N and Sequential Revision in natural language planning tasks. In the TravelPlanner and Natural Plan benchmarks, Mind Evolution solves more than 98% of the problem instances using Gemini 1.5 Pro without the use of a formal solver.

How can a large language model (LLM) be guided to think deeper about a complex problem and leverage inference time compute to improve its problem solving ability? Prior research has investigated various strategies for leveraging inference time compute, such as chain-of-thought [41, 21], self-consistency [39], sequential revision based on feedback [36, 30, 8, 19, 1], and search guided by auxiliary verifiers or evaluators [43]. When a solution evaluator is available, search strategies have an advantage of being able to reliably improve problem solving ability with increased compute. For example, methods such as Bestof- N [4, 24, 25] and tree search [37] naturally exploit additional compute to explore a larger set of solution candidates, thereby increasing the probability of finding a successful solution.

To better exploit inference time compute, we propose an evolutionary search strategy for LLMs that combines free-flowing stochastic exploration with large-scale iterative refinement. We refer to this approach as Mind Evolution. As illustrated in Figure 1, Mind Evolution is a genetic search strategy that evolves a diverse population of candidate solutions, leveraging an LLM to generate, recombine and refine solution candidates based on feedback from an evaluator. The overall process is analogous to combining divergent thinking (free-flowing parallel idea exploration) with convergent thinking (idea evaluation and selection), considered as hallmarks of intelligent problem solving behavior [14].

By contrast, Mind Evolution is not restricted to searching in a formal space. This allows Mind Evolution to be applied to problems that are not formalized, or remain difficult to formalize, as long as a programmatic solution evaluator is available. In particular, we focus on natural language planning tasks where candidate solutions can still be automatically parsed, evaluated and critiqued using an implementable oracle evaluator. This approach exploits the observation that it is often easier to evaluate the quality of a candidate solution than it is to generate good solutions for a given problem [11].

Pairing LLMs with Evolutionary Search In addition to the program generation studies discussed in Section 1, several recent works have explored combining LLMs and evolution for numerical optimization [26, 3] and combinatorial optimization [28, 44]. The problem spaces we tackle in this work, such as natural language planning, can also be viewed as combinatorial optimization problems – optimizing plans subject to constraints specified in natural language. In contrast to these previous studies, we focus on evolving solutions in natural language spaces instead of formal spaces. This removes the requirement of task formalization, which requires significant effort and expert knowledge for each task instance.

Evolutionary search usually begins with a population of independently generated candidate solutions. In each generation, the fitness of every individual is evaluated with respect to the target objective. Candidates are then stochastically selected for reproduction based on their fitness (“selection”). In reproduction, the genetic representations of selected parents are combined (“crossover”) and potentially altered (“mutation”) to produce new child solutions. Such a process creates the next generation of children, which then enter the population. Population fitness generally increases over successive generations, as parents with greater fitness are more likely to be selected for recombination.

Island Model To sustain diversity in an evolving population it is also helpful to introduce an island model [38, 5], where distinct sub-populations (“islands”) are created and evolved independently between “migration” and “island reset” events that occur at specified frequencies. For a migration operation, the solutions on one island are stochastically chosen based on fitness to migrate to an adjacent island. For an Island Reset operation, the populations on islands with low overall fitness are replaced by strong solutions from the global population, which also has a selection effect. The island model has been adopted in recent successful efforts, such as FunSearch [34].