Evolutionary Methods
Related topics:
- A Comprehensive Survey of Self-Evolving AI Agents: A New Paradigm Bridging Foundation Models and Lifelong Agentic SystemsTo address this limitation, recent research has explored agent evolution techniques that aim to automatically enhance agent systems based on interaction data and environmental feedback. This emerging …
- A Little Human Data Goes A Long WayFaced with an expensive human annotation process, creators of NLP systems increasingly turn to synthetic data generation. While this method shows promise, the extent to which synthetic data can replac…
- A Survey of Self-Evolving Agents: On Path to Artificial Super IntelligenceLarge Language Models (LLMs) have demonstrated remarkable capabilities across diverse tasks but remain fundamentally static, unable to adapt their internal parameters to novel tasks, evolving knowledg…
- Absolute Zero: Reinforced Self-play Reasoning with Zero DataReinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from outcome-based rewards. Recent RLVR wo…
- AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task GenerationFurthermore, planning ability is a crucial component of an LLM-based agent, involving interaction with the environment and executing actions to complete a planning task, which generally entails achiev…
- AlphaGo Moment for Model Architecture DiscoveryWhile AI systems demonstrate exponentially improving capabilities, the pace of AI research itself remains linearly bounded by human cognitive capacity, creating an increasingly severe development bott…
- Automated Alignment Researchers: Using large language models to scale scalable oversightLarge language models’ ever-accelerating rate of improvement raises two particularly important questions for alignment research. One is how alignment can keep up. Frontier AI models are now contribut…
- Autotelic Agents with Intrinsically Motivated Goal-Conditioned Reinforcement Learning: a Short SurveyBuilding autonomous machines that can explore open-ended environments, discover possible interactions and build repertoires of skills is a general objective of artificial intelligence. Developmental a…
- Bigger is not always better: The importance of human-scale language modeling for psycholinguisticsscaling has several downsides for both computational psycholinguistics and natural language processing research. We discuss the scientific challenges presented by the scaling paradigm, as well as the …
- Bilevel Autoresearch: Meta-Autoresearching ItselfIf autoresearch is itself a form of research, then autoresearch can be applied to research itself. We take this idea literally: we use an autoresearch loop to optimize the autoresearch loop. Every exi…
- Can Large Language Models Really Improve by Self-critiquing Their Own Plans?There have been widespread claims about Large Language Models (LLMs) being able to successfully verify or self-critique their candidate solutions in reasoning problems in an iterative mode. Intrigued …
- Can Large Reasoning Models Self-Train?Scaling the performance of large language models (LLMs) increasingly depends on methods that reduce reliance on human supervision. Reinforcement learning from automated verification offers an alternat…
- DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep ResearchDeep research models perform multi-step research to produce long-form, well-attributed answers. However, most open deep research models are trained on easily verifiable short-form QA tasks via reinfor…
- Darwin Godel Machine: Open-Ended Evolution of Self-Improving AgentsMost of today’s AI systems are constrained by human-designed, fixed architectures and cannot autonomously and continuously improve themselves. The scientific method, on the other hand, provides a cumu…
- Deep Researcher with Test-Time DiffusionDeep research agents, powered by Large Language Models (LLMs), are rapidly advancing; yet, their performance often plateaus when generating complex, long-form research reports using generic test-time …
- Diffusion Models are Evolutionary AlgorithmsIn a convergence of machine learning and biology, we reveal that diffusion models are evolutionary algorithms. By considering evolution as a denoising process and reversed evolution as diffusion, we m…
- End-to-End Test-Time Training for Long ContextOn the other hand, Transformers with self-attention still struggle to efficiently process long context equivalent to years of human experience, in part because they are designed for nearly lossless re…
- Evolving Deeper LLM ThinkingWe explore an evolutionary search strategy for scaling inference time compute in Large Language Models. The proposed approach, Mind Evolution, uses a language model to generate, recombine and refine c…
- Grounding Large Language Models in Interactive Environments with Online Reinforcement LearningLarge Language Models’ (LLM) abilities to capture abstract knowledge about world’s physics to solve decision-making problems. Yet, the alignment between LLMs’ knowledge and the environment can be wron…
- How Should We Meta-Learn Reinforcement Learning Algorithms?Meta-learning shows particular promise for reinforcement learning (RL), where algorithms are often adapted from supervised or unsupervised learning despite their suboptimality for RL. However, until n…
- HyperagentsSelf-improving AI systems aim to reduce reliance on human engineering by learning to improve their own learning and problem-solving processes. Existing approaches to recursive self-improvement typical…
- Improving Reinforcement Learning from Human Feedback Using Contrastive Rewardswe improve the effectiveness of the reward model by introducing a penalty term on the reward, named contrastive rewards. Our approach involves two steps: (1) an offline sampling step to obtain respons…
- Inverse-Q*: Token Level Reinforcement Learning for Aligning Large Language Models Without Preference DataIn this paper, we introduce Inverse-Q*, an innovative framework that transcends traditional RL methods by optimizing token-level reinforcement learning without the need for additional reward or value …
- Language Modeling by Language ModelsCan we leverage LLMs to model the process of discovering novel language model (LM) architectures? Inspired by real research, we propose a multi-agent LLM approach that simulates the conventional stage…
- Large Language Model Agents Are Not Always Faithful Self-EvolversSelf-evolving large language model (LLM) agents continually improve by accumulating and reusing past experience, yet it remains unclear whether they faithfully rely on that experience to guide their b…
- Learning to Discover at Test TimeHow can we use AI to discover a new state of the art for a scientific problem? Prior work in test-time scaling, such as AlphaEvolve, performs search by prompting a frozen LLM. We perform reinforcement…
- MLE-STAR: Machine Learning Engineering Agent via Search and Targeted RefinementAgents based on large language models (LLMs) for machine learning engineering (MLE) can automatically implement ML models via code generation. However, existing approaches to build such agents often r…
- Meta-Rewarding Language Models: Self-Improving Alignment with LLM-as-a-Meta-JudgeLarge Language Models (LLMs) are rapidly surpassing human knowledge in many domains. While improving these models traditionally relies on costly human data, recent self-rewarding mechanisms (Yuan et a…
- Mind the Gap: Examining the Self-Improvement Capabilities of Large Language ModelsSelf-improvement is a mechanism in Large Language Model (LLM) pre-training, post-training and test-time inference. We explore a framework where the model verifies its own outputs, filters or reweights…
- Nested Learning: The Illusion of Deep Learning ArchitecturesOver the last decades, developing more powerful neural architectures and simultaneously designing optimization algorithms to effectively train them have been the core of research efforts to enhance th…
- OMNI-SIMPLEMEM: Autoresearch-Guided Discovery of Lifelong Multimodal Agent MemoryAI agents increasingly operate over extended time horizons, yet their ability to retain, organize, and recall multimodal experiences remains a critical bottleneck. Building effective lifelong memory r…
- OpenClaw-RL: Train Any Agent Simply by TalkingEvery agent interaction generates a next-state signal, namely the user reply, tool output, terminal or GUI state change that follows each action, yet no existing agentic RL system recovers it as a liv…
- Post-Completion Learning for Language ModelsCurrent language model training paradigms typically terminate learning upon reaching the end-of-sequence (<eos) token, overlooking the potential learning opportunities in the post-completion space. We…
- Promptbreeder: Self-Referential Self-Improvement Via Prompt EvolutionPopular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large Language Models (LLMs) in various domains. However, such hand-crafted prompt-strateg…
- R-Zero: Self-Evolving Reasoning LLM from Zero DataSelf-evolving Large Language Models (LLMs) offer a scalable path toward superintelligence by autonomously generating, refining, and learning from their own experiences. However, existing methods for t…
- RL + Transformer = A General-Purpose Problem SolverWhat if artificial intelligence could not only solve problems for which it was trained but also learn to teach itself to solve new problems (i.e., metalearn)? In this study, we demonstrate that a pre-…
- RL-PLUS: Countering Capability Boundary Collapse of LLMs in Reinforcement Learning with Hybrid-policy OptimizationReinforcement Learning with Verifiable Reward (RLVR) has significantly advanced the complex reasoning abilities of Large Language Models (LLMs). However, it struggles to break through the inherent cap…
- ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMsProcess Reward Models (PRMs) have recently emerged as a powerful framework for supervising intermediate reasoning steps in large language models (LLMs). Previous PRMs are primarily trained on model fi…
- Reinforcement Learning be Enough for Thinking?In the context of large language models (LLMs), recent work by Guo et al. proposed a unified model whereby System 2 type “thinking” emerged as a consequence of model-free RL applied to solve mathemati…
- Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?The advent of test-time scaling in large language models (LLMs), exemplified by OpenAI’s o1 series, has advanced reasoning capabilities by scaling computational resource allocation during inference. W…
- Reward Reasoning ModelReward models play a critical role in guiding large language models toward outputs that align with human expectations. However, an open challenge remains in effectively utilizing test-time compute to …
- Scaling Expert Language Models with Unsupervised Domain DiscoveryLarge language models are typically trained densely: all parameters are updated with respect to all inputs. This requires synchronization of billions of parameters across thousands of GPUs. We introdu…
- Self-Discover: Large Language Models Self-Compose Reasoning Structures*Table 2. All 39 reasoning modules consisting of high-level cognitive heuristics for problem-solving. We adopt them from Fernando et al.* (_2023_). Reasoning Modules 1 How could I devise an experim…
- Self-Evaluation Guided Beam Search for ReasoningBreaking down a problem into intermediate steps has demonstrated impressive performance in Large Language Model (LLM) reasoning. However, the growth of the reasoning chain introduces uncertainty and e…
- Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization ChallengesLarge language models often struggle with length generalization and solving complex problem instances beyond their training distribution. We present a self-improvement approach where models iterativel…
- Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models“Typical alignment methods include Supervised Fine-Tuning (SFT) (Ouyang et al., 2022; Tunstall et al., 2023a) based on human demonstrations, and Reinforcement Learning from Human Feedback (RLHF) (Chri…
- Self-Rewarding Language ModelsWe posit that to achieve superhuman agents, future models require superhuman feedback in order to provide an adequate training signal. Current approaches commonly train reward models from human prefer…
- Self-distillation Enables Continual LearningContinual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement le…
- SkillClaw: Let Skills Evolve Collectively with Agentic EvolverLarge language model (LLM) agents such as OpenClaw rely on reusable skills to perform complex tasks, yet these skills remain largely static after deployment. As a result, similar workflows, tool usage…
- Stream of Search (SoS): Learning to Search in LanguageLanguage models are rarely shown fruitful mistakes while training. They then struggle to look beyond the next token, suffering from a snowballing of errors and struggling to predict the consequence of…
- TTRL: Test-Time Reinforcement LearningThis paper investigates Reinforcement Learning (RL) on data without explicit labels for reasoning tasks in Large Language Models (LLMs). The core challenge of the problem is reward estimation during i…
- Test-Time Scaling with Reflective Generative ModelWe introduce our first reflective generative model MetaStone-S1, which obtains OpenAI o3- mini’s performance via the new Reflective Generative Form. The new form focuses on highquality reasoning traje…
- The Curse Of Recursion: Training On Generated Data Makes Models Forget“Stable Diffusion revolutionised image creation from descriptive text. GPT-2, GPT-3(.5) and GPT-4 demonstrated astonishing performance across a variety of language tasks. ChatGPT introduced such langu…
- The Invisible Leash: Why RLVR May Not Escape Its OriginRecent advances in large reasoning models highlight Reinforcement Learning with Verifiable Rewards (RLVR) as a promising method for enhancing AI’s capabilities, particularly in solving complex logical…
- Toward Self-Improvement of LLMs via Imagination, Searching, and CriticizingDespite the impressive capabilities of Large Language Models (LLMs) on various tasks, they still struggle with scenarios that involves complex reasoning and planning. Recent work proposed advanced pro…
- Tree Search for Language Model AgentsAutonomous agents powered by language models (LMs) have demonstrated promise in their ability to perform decision-making tasks such as web automation. However, a key limitation remains: LMs, primarily…
- Truly Self-Improving Agents Require Intrinsic Metacognitive LearningSelf-improving agents aim to continuously acquire new capabilities with minimal supervision. However, current approaches face two key limitations: their self-improvement processes are often rigid, fai…
- Unsupervised Elicitation of Language ModelsTo steer pretrained language models for downstream tasks, today’s post-training paradigm relies on humans to specify desired behaviors. However, for models with superhuman capabilities, it is difficul…