Fine-tuning Large Language Model for Automated Algorithm Design

Paper · arXiv 2507.10614 · Published July 13, 2025

The integration of large language models (LLMs) into automated algorithm design has shown promising potential. A prevalent approach embeds LLMs within search routines to iteratively generate and refine candidate algorithms. However, most existing methods rely on off-the-shelf LLMs trained for general coding tasks, leaving a key question open: Do we need LLMs specifically tailored for algorithm design? If so, how can such LLMs be effectively obtained and how well can they generalize across different algorithm design tasks? In this paper, we take a first step toward answering these questions by exploring fine-tuning of LLMs for algorithm design. We introduce a Diversity-Aware Rankbased (DAR) sampling strategy to balance training data diversity and quality, then we leverage direct preference optimization to efficiently align LLM outputs with task objectives.

Rather than training algorithm design LLMs from scratch, we adopt a fine-tuning approach to adapt LLMs for automated algorithm design tasks. Among various learning methods, we employ Direct Preference Optimization (DPO), a reward-free method that trains models to prefer high-quality outputs over inferior ones using preference pairs.

As shown in Figure 1, our framework consists of two stages: 1) Data Generation: We use LLM-driven iterative algorithm search (e.g., EoH (Liu et al., 2024b)) to generate diverse algorithms (Upper section). 2) Preference Learning: The collected algorithms are sampled to compose preference pairs (samples), enabling the LLM to learn preferred designs over less favoured ones (Lower section). The fine-tuned LLMs can be used to generate algorithms.