Can language models beat human venture capital experts?

Explores whether LLMs can outperform top investors at founder success prediction in a domain where even experts show only modest accuracy. Tests whether AI forecasting is competitive in sparse-signal, high-uncertainty settings.

Synthesis note · 2026-06-03 · sourced from Reasoning by Reflection

Venture capital is a clean testbed for expert forecasting under uncertainty: signals are sparse, outcomes uncertain, and even top investors perform modestly in absolute terms. At inception the market index achieves only 1.9% precision; Y Combinator reaches ~3.2% (1.7× the index) and tier-1 firms ~5.6% (2.9×). VCBench standardizes 9,000 anonymized founder profiles (with adversarial tests cutting re-identification risk >90% while preserving predictive signal) and evaluates nine LLMs. Several surpass the human baselines — DeepSeek-V3 delivers over six times the index precision, GPT-4o achieves the highest F0.5 — and most models beat the human benchmarks.

The keeper is the structural point about where LLMs win: in low-base-rate, sparse-signal forecasting, modest absolute accuracy can still beat expert humans because the human bar is itself modest. This reframes "can AI match experts?" — in domains where expertise yields only a small edge over chance, the bar to exceed experts is correspondingly low, and anonymized profile features alone suffice.

This complements the vault's forecasting thread. Since Can LLMs actually forecast time series better than we think?, VCBench supplies the domain where even raw model capability clears a low human bar; and it extends Can retrieval-augmented language models forecast like human experts? — that result nears the competitive-crowd bar; VCBench shows that where the human bar is modest, LLMs clear it outright.

Inquiring lines that use this note as a source 6

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 2

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 128 in 2-hop network ·dense cluster Open in graph ↗

Can language models beat human venture capital e… Can LLMs actually forecast time series better than… Do automated benchmarks hide what frontier AI syst…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can LLMs actually forecast time series better than we think? Explores whether language models possess stronger forecasting ability than current benchmarks suggest, and what role workflow design plays in revealing or hiding that capability.
VCBench is a domain where the modest human bar makes LLM forecasting competitive
Do automated benchmarks hide what frontier AI systems can really do? Benchmarks optimize for auto-gradable, short, cheap tasks. But real AI capability emerges in long-horizon, messy, open-ended work. How much capability are we missing—or wrongly inflating—by relying on benchmark scores alone?
VCBench is a real-stakes, privacy-preserving benchmark in the open-world spirit

Can language models beat human venture capital experts?

Related concepts in this collection 2

Related papers in this collection 8

Search by related questions 4