Nex-N1: Agentic Models Trained via a Unified Ecosystem for Large-Scale Environment Construction

Paper · arXiv 2512.04987 · Published December 4, 2025

The evolution of Large Language Models (LLMs) from passive responders to autonomous agents necessitates a fundamental shift in learning paradigms—from static imitation to incentive-driven decision making. However, this transition is significantly impeded by the lack of scalable infrastructure capable of constructing high-quality interaction signals for effective policy learning. To address this, we introduce a comprehensive method designed to systematically scale the diversity and complexity of interactive environments. Our method realizes this scaling by addressing three orthogonal dimensions: (1) Complexity: NexAU, a flexible agent framework that supports building complex agent hierarchies via simple configurations; (2) Diversity: NexA4A automatically generate diverse agent hierarchies from natural language to cover infinite domains; and (3) Fidelity: NexGAP bridges the simulation–reality gap by integrating dynamic realworld environment for grounded trajectories synthesis. We train Nex-N1 upon the diverse and complex interactive environments established by our infrastructure. Empirical results on benchmarks such as SWE-bench and τ2 demonstrate that Nex-N1 consistently outperforms SOTA open-source models and achieves competitive performance frontier proprietary models on complex agentic tasks. We open-source the Nex ecosystem and model weights to facilitate further research1.

The evolution of Large Language Models (LLMs) from passive information processors to autonomous Agents represents a fundamental shift in the pursuit of Artificial General Intelligence (AGI) (Xi et al., 2023; Wang et al., 2023; Bubeck et al., 2023). While current foundation models demonstrate remarkable capabilities in knowledge representation and reasoning (OpenAI, 2023; Liu et al., 2024), deploying them as reliable agents in real-world scenarios remains a formidable challenge. A central capability required for this transition is ’Agency’—the capacity to anchor deep reasoning in reality: moving beyond internal intuition to actively perceive the environment and adapt strategies based on dynamic feedback.

However, a critical misalignment exists between the myopic “next-token prediction” objective governing LLM pre-training and the long-horizon, goal-oriented nature of agentic tasks. We argue that bridging this gap requires transforming the learning process itself: from learning what to say to learning how to act, which demands a new scale of interactive environments.

• Scarcity of Diverse Environments. LLMs trained on static text corpora often act as “System 1” responders, lacking the “System 2” rigor required for complex planning (Bengio et al., 2021). Without exposure to environments that demand longterm reasoning, models succumb to probability traps and myopic decision-making (Valmeekam et al., 2023). Furthermore, constructing interactive environments that are both broad in scope and reliable in structure is prohibitively expensive. Current approaches rely on limited environments or rigid frameworks (Liu et al., 2023; Andrews et al., 2025), which fail to provide the behavioral diversity needed for models to generalize to novel tasks (Barres et al., 2025).

• Lack of Realistic Grounding. Agents trained on purely synthetic or static data often struggle with the complexity of real-world execution. They exhibit a disconnect between “thought” and “action”, leading to hallucinations in tool usage—such as invoking APIs based on outdated assumptions (Patil et al., 2023b; Schick et al., 2023). Unlike biological systems that learn through interaction, LLMs typically fail to perform robust error recovery or self-correction when actions fail (Shinn et al., 2023; Yao et al., 2023). True agentic capability requires training on trajectories that capture the latency, stochasticity, and feedback loops of real-world execution.

To address this, we propose an approach based on agentic scaling to generate diverse environments and high-quality data. We introduce a unified system comprising three components: NexAU (Agent Universe), a universal agent framework that hides complexities of agent features (execution loop, tools, sub-agents, context management etc.) from agent builders. Simple configurations could generate very complex and diverse agents. NexA4A (Agent for Agent), a generative system that automatically synthesizes diverse agent architectures and workflows from natural language specifications; and NexGAP (General Agent-data Pipeline), which leverages real-world Model Context Protocol (MCP) tools and information fusion to generate massive-scale, end-to-end trajectories rooted in authentic execution. Leveraging this system, we train Nex-N1, a series of models that demonstrate robust generalization across heterogeneous agent frameworks. Our main contributions are summarized as follows:

Infrastructure for Environmental Scaling. We propose a unified ecosystem (NexAU, NexA4A, NexGAP) that transforms environment construction from manual engineering to automated synthesis. By treating agent environments as generative language specifications rather than static code, we break the dependency on human-designed environments and enable the infinite scaling of diverse interaction topologies.

However, when examined at scale, this ecosystem exposes two structural obstacles: a) the lack of scalable and stable environment simulation, including tools, state transitions, and error behaviors, requires substantial manual effort, as most frameworks were designed for small-scale experiments rather than large-scale, reproducible trajectory generation; b) existing frameworks provide limited diversity. They cover only a narrow range of tasks, tools, and interaction patterns, with variable implementation quality, and the resulting behavioral space for agentic learning is constrained.

What truly matters are the context-passing relationships among agents, each agent’s system prompt, the tools involved, and the invocation format.

This observation motivates a unified and extensible agent engine that abstracts away framework-specific idiosyncrasies while preserving the behavioral structure essential for learning. So we introduce NexAU (Nex Agent Universe), a lightweight, high-throughput runtime that decouples agent definition from agent execution. NexAU provides a consistent substrate for faithfully simulating tools, environments, and error dynamics at scale, enabling the generation of high-quality trajectories across diverse tasks. Unlike rigid graph based orchestration systems, NexAU adopts a recursive, fractal architecture inspired by the ReAct paradigm Yao et al. (2023), treating sub-agents, tools, and external services as interchangeable functional units. This design both unifies heterogeneous frameworks under a single execution model and expands the diversity and realism of environments available for agentic scaling.