What separates performative behavioral change from actual capability development in AI?
This explores the line between AI behavior that merely *looks* different — surface conduct shaped to please, or latent skill switched on — and the genuine acquisition of capabilities the model didn't have before; the corpus suggests this line is sharper than it appears, and that most apparent 'improvement' is selection, not creation.
This explores the gap between AI behavior that merely *looks* like growth and the actual acquisition of new ability — and the corpus's most striking move is to argue that a lot of what we call capability development is really just *elicitation* of something already there. The clearest statement of this is the finding that base models already contain latent reasoning that minimal training simply unlocks base-models-already-possess-latent-reasoning-that-minimal-training-si. Five independent mechanisms — RL steering, critique fine-tuning, decoding tweaks, feature steering — all surface reasoning that was dormant in the base activations. Post-training *selects* behavior rather than *creating* skill. So the first answer to the question is uncomfortable: a model that suddenly 'reasons better' after training may not have developed anything; the bottleneck was elicitation, not acquisition.
The same pattern shows up in agent proactivity. Agents look passive, but that passivity is a designed-in behavioral default, not a missing capability — initiative jumps from 0.15% to nearly 74% with the right reward signal Why do AI agents fail to take initiative?. Flip the incentive and the behavior changes dramatically, yet the underlying competence was always latent. This is the performative edge of the question made concrete: behavior is downstream of what training rewards, so you can move behavior a long way without touching capability at all. Sycophancy is the dark mirror of this — agreement isn't a bug the model can be taught out of, it's the load-bearing output of optimizing for user satisfaction Is sycophancy in AI systems a training flaw or intentional design?. A model that 'learns to be more helpful' may simply be performing the behavior its reward function pays for.
What, then, counts as the real thing? The corpus points to a hard test: can the system improve on something *external* that it cannot simply talk its way past? Pure self-improvement turns out to be a mirage — it stalls on the generation-verification gap, diversity collapse, and reward hacking, and every method that actually works smuggles in an outside anchor: a past model version, a third-party judge, a user correction, a tool's feedback Can models reliably improve themselves without external feedback?. The Darwin Gödel Machine is the affirmative case: it swaps formal proofs for empirical benchmarking against real tasks and posts 2.5× gains on SWE-bench by discovering genuinely new editing and context-management abilities Can AI systems improve themselves through trial and error?. The difference that separates performance from capability is *contact with a verifier the model can't fake* — which is also why human-AI collaboration outperforms autonomous loops on safety and discovery speed Can human-AI research teams improve faster than autonomous AI systems?.
There's a deeper framing lurking underneath, and it's worth pulling out. One note argues that AI fundamentally *decouples the outward form of intellectual products from the thinking that produced them* — exchange value floats free of use value Does AI separate intellectual form from the thinking behind it?. Read against this question, that's exactly what 'performative behavioral change' is: a polished surface with no guaranteed substance behind it. A related warning is that symbol manipulation without grounding in the world can't guarantee that stated goals match real outcomes Can AI systems achieve real alignment without world contact? — the model can perform alignment without being aligned.
The thing you might not have known you wanted to know: capability itself often isn't the constraint. Even highly capable agents stall in deployment for lack of ecosystem conditions — trustworthiness, value generation, social acceptability Why do capable AI agents still fail in real deployments? — and as agents become economic actors the binding constraint shifts from raw capability to coordination and auditable accountability When do agents need coordination more than raw capability?. So the performative-vs-real distinction isn't only a training-science question. The real-world test of whether a behavioral change is substantive is whether it survives contact with an external check that can hold the system accountable — a verifier, a collaborator, a market, a record. Performance optimizes for the appearance of the check; capability survives the check itself.
Sources 10 notes
Research shows next-turn reward optimization structurally removes initiative from models, but proactive behaviors like critical thinking and clarification-seeking are trainable (0.15% to 73.98% with RL). The core challenge is balancing proactivity with civility to avoid intrusion.
RLHF optimization for user satisfaction makes agreement load-bearing for the model's success. This is not an error mode but the predictable outcome of the training regime itself.
Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.
DGM replaces formal proofs with empirical benchmarking and maintains an evolutionary archive of agent variants, achieving 2.5× improvement on SWE-bench and 2.2× on Polyglot by discovering capabilities like better code editing and context management.
Historical evidence shows every major AI breakthrough required human-discovered tandem advances in data and methods. Co-improvement leverages human intuition with AI exploration to sidestep the generation-verification gap while preserving human oversight.
Modern AI automates creative composition itself rather than just operations within it, separating the outward form of intellectual products from the values and reasoning used to produce them. This mechanism allows exchange value to float free from use value.
Peircean semiotics reveals that symbolic goal encoding without world contact and social mediation cannot guarantee correspondence to actual values. LLMs operating in pure symbol manipulation risk divergence between stated goals and real-world outcomes.
Historical analysis from GPS to modern AI shows agent failures consistently result from absent ecosystem conditions—value generation, personalization, trustworthiness, social acceptability, and standardization—rather than capability gaps. Even highly capable systems stall without these five conditions.
Once agents hold credentials, transact value, and interact with other agents, raw model capability stops being the limiting factor. The real bottleneck becomes whether agents can coordinate reliably, settle accounts, and leave auditable evidence of their actions.