Agentic and Multi-Agent Systems LLM Reasoning and Architecture

Can AI systems discover better neural architectures than humans?

Can multi-agent LLM systems, when structured with genetic programming, discover novel neural network designs that outperform human-engineered architectures? This matters because it could automate a critical bottleneck in AI research.

Note · 2026-02-23 · sourced from Novel Architectures

Genesys models the conventional stages of research — ideation, literature search, code generation, pretraining, evaluation — as a multi-agent LLM system. The key innovation is the Ladder of Scales approach: new designs are proposed, adversarially reviewed, implemented, and selectively verified at increasingly larger model scales (14M→350M parameters) with a narrowing budget at each scale.

The genetic programming (GP) backbone is critical. Rather than using LLMs to directly prompt-generate architectures (which has an ~86% failure rate), Genesys represents architectures as Generalized Autoregressive Blocks (GABs) — a code construct factorizable into discrete tree representations. GP-style operations (crossover, mutation) on these trees produce meaningful architectural variations far more reliably than direct generation.

Results: 1,162 newly discovered designs (1,062 fully verified through pretraining). The best designs outperform GPT-2, Mamba-2, and other known architectures on 6/9 common benchmarks. This is achieved through a principled search process, not brute-force sampling.

The system architecture mirrors human research:

Unlike traditional Neural Architecture Search (NAS) which searches within human-defined operation spaces (attention heads, convolution kernels), Genesys searches a broader space of operations and architectures while modeling the broader scientific discovery process.

The factorization into GP-representable trees is the insight that makes this practical: it provides structure to the search space that direct LLM generation lacks. The ~86% improvement in successful design generation from GP vs. direct prompting suggests that current LLMs need structured representations to do creative design work reliably — they cannot yet reliably generate novel working architectures from freeform description alone.


Source: Novel Architectures

Related concepts in this collection

Concept map
13 direct connections · 131 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

multi-agent LLM systems discover novel neural architectures competitive with human-designed ones through genetic programming