SYNTHESIS NOTE
Reasoning, Retrieval, and Evaluation Agentic Systems and Tool Use Training, RL, and Test-Time Scaling

Can language models beat human venture capital experts?

Explores whether LLMs can outperform top investors at founder success prediction in a domain where even experts show only modest accuracy. Tests whether AI forecasting is competitive in sparse-signal, high-uncertainty settings.

Synthesis note · 2026-06-03 · sourced from Reasoning by Reflection

Venture capital is a clean testbed for expert forecasting under uncertainty: signals are sparse, outcomes uncertain, and even top investors perform modestly in absolute terms. At inception the market index achieves only 1.9% precision; Y Combinator reaches ~3.2% (1.7× the index) and tier-1 firms ~5.6% (2.9×). VCBench standardizes 9,000 anonymized founder profiles (with adversarial tests cutting re-identification risk >90% while preserving predictive signal) and evaluates nine LLMs. Several surpass the human baselines — DeepSeek-V3 delivers over six times the index precision, GPT-4o achieves the highest F0.5 — and most models beat the human benchmarks.

The keeper is the structural point about where LLMs win: in low-base-rate, sparse-signal forecasting, modest absolute accuracy can still beat expert humans because the human bar is itself modest. This reframes "can AI match experts?" — in domains where expertise yields only a small edge over chance, the bar to exceed experts is correspondingly low, and anonymized profile features alone suffice.

This complements the vault's forecasting thread. Since Can LLMs actually forecast time series better than we think?, VCBench supplies the domain where even raw model capability clears a low human bar; and it extends Can retrieval-augmented language models forecast like human experts? — that result nears the competitive-crowd bar; VCBench shows that where the human bar is modest, LLMs clear it outright.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

in domains where expert humans perform only modestly LLMs can surpass human-expert baselines — sparse-signal forecasting rewards modest absolute accuracy