AI Social Psychology Reasoning and Learning Architectures Language Understanding and Reasoning

Can humans understand deep learning before AI does?

Explores whether investing in human-parseable deep learning theory remains valuable even if AI systems eventually develop their own self-understanding. Centers on why this matters for safety oversight.

Note · 2026-05-18 · sourced from Foundation Models

A common dismissal of deep learning theory: "AI will become powerful enough to understand itself before humans understand it, so investing in theory is a transitional concern at most." The argument in There Will Be a Scientific Theory of Deep Learning pushes back on this with a safety-grounded counter.

Theory is already useful at current capability levels and will be more useful as it develops. It seems unlikely that AI working in isolation will suddenly and separately "solve deep learning theory" without human scientists in the loop. The more realistic trajectory is breakthrough progress in a transitory period driven by human scientists using or working with AI — and during that period, the human side of the partnership needs frameworks it can reason about.

The safety argument is the load-bearing one. If the goal is AI safety, some human oversight of AI systems will be necessary. Human oversight requires a human-parseable theory — a framework in which experts can articulate concerns, identify failure modes, and reason about training dynamics they did not run themselves. Without that theory, oversight degenerates into either trust ("the model says it's safe") or empirical pattern-matching against past incidents. Neither is sufficient for novel deployments.

This positions deep learning theory as alignment infrastructure rather than as pure science. The question is not whether AI can eventually self-explain — it is whether humans have a framework to evaluate the explanation. A theory that lives only inside AI systems cannot serve as the basis for human-led safety review. The theory needs to live in humans, and it needs to live there before the systems are capable enough that the safety stakes become irreversible.

The implication for research funding and attention: learning mechanics is not optional or post-hoc; it is part of the precondition for keeping humans in the AI development loop at scale.

Related concepts in this collection

Concept map
13 direct connections · 120 in 2-hop network ·dense cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

the field needs a human-parseable theory of deep learning regardless of AI self-understanding — for AI safety experts must remain in the loop