Agentic and Multi-Agent Systems

What blocks scaling from language models to autonomous agents?

If large language models excel at next-token prediction, why do they struggle with long-horizon goal-oriented tasks? This explores whether the bottleneck is model capacity or the environments used to train them.

Note · 2026-05-03 · sourced from Action Models

Nex-N1's diagnosis is that the LLM-to-agent transition is blocked by a misalignment between LLM pretraining (myopic next-token prediction) and the long-horizon goal-oriented nature of agentic tasks — and that bridging this requires not better models but a new scale of interactive environments. Scarcity of diverse environments leaves models as "System 1" responders without "System 2" rigor; lack of realistic grounding produces hallucinated tool use and brittle error recovery.

The structural claim is that environments must scale on three orthogonal dimensions, and a deficit on any one ruins the resulting policy. Complexity comes from agent hierarchies — NexAU is a lightweight high-throughput runtime that decouples agent definition from execution, treating sub-agents and tools as interchangeable functional units in a recursive ReAct-like architecture. Diversity comes from automated synthesis — NexA4A generates diverse agent architectures and workflows from natural-language specifications rather than human-designed templates, breaking the dependency on hand-built environments. Fidelity comes from grounding — NexGAP integrates real Model Context Protocol (MCP) tools and information fusion, generating trajectories rooted in authentic latency, stochasticity, and feedback loops.

The orthogonality matters because earlier frameworks fail in characteristic ways: rigid graph-based orchestrators provide reliability but limit diversity; pure synthetic environments provide diversity but break on real execution. Treating environments as generative language specifications rather than static code is the move that lets all three axes scale together. The empirical signal — Nex-N1 outperforms SOTA open-source models and approaches frontier proprietary models on SWE-bench and τ2 — supports the thesis that the limiting reagent has been environments, not parameters.

This stands in productive tension with Can 78 demonstrations teach agency better than 10000?, which argues that strategic data curation beats environment-scale; the resolution is likely that environment richness sets a ceiling that curated data exploits, not a substitute for curation.


Source: Action Models

Related concepts in this collection

Concept map
13 direct connections · 117 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

agentic training requires environment scaling along three orthogonal dimensions — complexity diversity and real-world fidelity must scale together