AlphaGo Moment for Model Architecture Discovery

Paper · arXiv 2507.18074 · Published July 24, 2025
Novel ArchitecturesEvolution

While AI systems demonstrate exponentially improving capabilities, the pace of AI research itself remains linearly bounded by human cognitive capacity, creating an increasingly severe development bottleneck. We present ASI-ARCH, the first demonstration of Artificial Superintelligence for AI research (ASI4AI) in the critical domain of neural architecture discovery—a fully autonomous system that shatters this fundamental constraint by enabling AI to conduct its own architectural innovation. Moving beyond traditional Neural Architecture Search (NAS), which is fundamentally limited to exploring human-defined spaces, we introduce a paradigm shift from automated optimization to automated innovation. ASI-ARCH can conduct end-to-end scientific research in the challenging domain of architecture discovery, autonomously hypothesizing novel architectural concepts, implementing them as executable code, training and empirically validating their performance through rigorous experimentation and past human and AI experience. ASI-ARCH conducted 1,773 autonomous experiments over 20,000 GPU hours, culminating in the discovery of 106 innovative, state-of-the-art (SOTA) linear attention architectures. Like AlphaGo’s Move 37 that revealed unexpected strategic insights invisible to human players, our AI-discovered architectures demonstrate emergent design principles that systematically surpass human-designed baselines and illuminate previously unknown pathways for architectural innovation (Fig. 2). Crucially, we establish the first empirical scaling law for scientific discovery itself—demonstrating that architectural breakthroughs can be scaled computationally, transforming research progress from a human-limited to a computation-scalable process. We provide comprehensive analysis of the emergent design patterns and autonomous research capabilities that enabled these breakthroughs, establishing a blueprint for self-accelerating AI systems. To democratize AI-driven research, we open-source the complete framework, discovered architectures, and cognitive traces.

The Researcher module proposes novel architectures, the Engineer module conducts empirical evaluations by executing them in a real-world environment, and the Analyst module performs analytical summaries of the results to acquire new insights. All experimental data and derived insights are systematically archived in a central database, creating a persistent memory that drives the entire process.

The Researcher module proposes novel architectures, the Engineer module conducts empirical evaluations by executing them in a real-world environment, and the Analyst module performs analytical summaries of the results to acquire new insights. All experimental data and derived insights are systematically archived in a central database, creating a persistent memory that drives the entire process.

To ensure the system progressively generates superior designs, we implement an evolutionary improvement strategy that enables the model to continuously learn from experience. This is realized through two key mechanisms: first, a comprehensive fitness score that holistically evaluates each new architecture, providing a clear optimization target; and second, the ability to leverage both distilled knowledge from human expert literature (cognition) and analytical summaries of its own past experiments (analysis) to inform subsequent design proposals. Given the resource-intensive nature of this evolutionary process, we adopt a two-stage exploration-then-verification strategy. The initial stage involves broad exploration on small-scale models to efficiently identify a large pool of promising candidates. In the final stage, these candidates are scaled up to larger models for rigorous validation, confirming their state-of-the-art performance.

Seed Selection ASI-ARCH maintains a candidate pool containing the top-50 highest-scoring architectures from all previous experiments. For each evolution step, we use a two-level sampling approach: one parent architecture is randomly selected from the top-10 performers to serve as the base for modifications, while 4 reference architectures are sampled from positions 11-50 to provide diverse design examples. This two-tier selection ensures that evolution builds on proven success while maintaining enough randomness to explore new directions. The parent architecture gets modified directly, while the reference architectures serve as examples of successful design patterns without being changed themselves.

Novelty and Sanity Check To ensure that each proposed architecture is both novel and will be correctly implemented, we implement a two-stage validation process before it is accepted for training. The first stage is a similarity check to prevent redundancy. When a new architecture is proposed, we first extract its motivation and use embedding-based search to find the top-5 most similar historical motivations. A specialized LLM then evaluates whether the new proposal represents a genuine innovation or merely a variation of existing work. The second stage consists of code-level sanity checks to prevent fundamental implementation flaws, including verifying that the code does not exceed O(n2) complexity and ensuring that masking is implemented correctly to prevent information leakage. If a proposal fails either the novelty or the correctness check, it is rejected, and the relevant feedback is returned to the agent to prompt a rewrite.

Real Code Environment The quantitative evaluation takes place within an interactive coding environment where the agent must utilize a defined set of tools to initiate training, modify code, and inspect error logs. A key differentiator of ASI-ARCH is its robust self-revision mechanism. In stark contrast to previous work (Cheng et al., 2025) that often uses static analysis like Abstract Syntax Tree (AST) parsing and simply discards any architecture that fails these checks, ASI-ARCH requires the agent to fix its own mistakes. When a training run fails due to an implementation error, the system automatically captures the full error log and delivers it back to the agent, which is then tasked with analyzing this feedback and revising its previously generated code. This iterative debugging loop continues until training is successful, ensuring promising ideas are not prematurely discarded due to simple coding mistakes. Furthermore, to maintain high efficiency, an automated quality assurance system monitors training logs in real-time. This is critical because some functional designs can be prohibitively inefficient, such as a model consuming two to three times the training duration of its peers. ASI-ARCH detects such anomalies, as well as fundamental bugs indicated by abnormally low loss, and immediately terminates the run, reporting the issue back to the agent for revision. This proactive termination prevents wasting resources on flawed architectures and significantly accelerates the overall search process.

Cognition Base To ensure ASI-ARCH can leverage existing domain knowledge, we construct a cognition-centered knowledge base. We selected nearly 100 seminal papers from the field of linear attention and used a dedicated LLM to extract 1-3 distinct cognitions from each. Each cognition is a structured entry composed of three key elements: the applicable scenario, which describes the specific problem the original paper aimed to solve; the proposed algorithm, which summarizes the core technical solution; and the historical context, which situates the paper within the research trends of its time. To guarantee the utility of this knowledge base, we carefully engineered the prompt for the extraction LLM.

The prompt’s structure is specifically designed to ensure that the extracted “experiment trigger” align semantically with the “problem analyses” generated by our Analyst module. This alignment is crucial for effective retrieval. In the final stage of analysis, the Analyst summarizes the specific shortcomings observed in the current experiment, and this summary is used as a query for embedding-based retrieval against the scenarios in our knowledge base. The retrieved cognition content is then stored in our database for future reference, providing a highly relevant, information-dense, and targeted way for the Researcher module to find solutions