What knowledge can prompt optimization actually activate in trained models?
This explores the ceiling on what clever prompting can do — whether optimizing a prompt unlocks genuine capability in a model, or only reshuffles what training already put there.
This explores the hard limit on prompting: when you optimize a prompt, are you teaching the model something new, or just calling up what it already knows? The corpus is unusually unanimous here — prompting reorganizes, it does not install. The cleanest statement is that prompt optimization retrieves existing knowledge but cannot inject knowledge the model never learned Can prompt optimization teach models knowledge they lack?. No prompt strategy compensates for a missing foundation; if the domain knowledge wasn't in training, no phrasing conjures it. So the honest answer to the question is: prompting can only activate latent knowledge already inside the model's training distribution.
That sounds limiting, but the same corpus shows the latent space is enormous. There exists a single finite transformer that, given the right prompt, can compute any computable function — prompts are effectively Turing-complete programs Can a single transformer become universally programmable through prompts?. The catch is the very next sentence of that result: standard training almost never produces a model that learned to run arbitrary programs this way. So the theoretical ceiling is 'anything,' but the practical ceiling is 'whatever capabilities training happened to encode and leave reachable.' Prompting is the act of finding the entry point to a skill that's already in there, not building the skill.
What does 'reachable' depend on? Confidence and structure, it turns out. Models that are confident on a task resist rephrasing and behave consistently; low-confidence skills swing wildly with small prompt changes Does model confidence predict robustness to prompt changes?. That reframes prompt optimization as a search for the framing that lands on a region where the model is already competent — and prompt quality itself turns out to be a structured, measurable space (communication, cognition, logic, and more) rather than magic words Can we measure prompt quality independent of model outputs?. There's also a subtle co-authorship effect worth knowing: as users iteratively refine prompts, they steer outputs toward what they already expected, so the 'activated knowledge' is partly a projection of the user's own priors back at them How much does the user shape what a model generates?.
The sharpest lateral insight is that even heavier interventions hit the same wall. If you assumed fine-tuning escapes this limit, the corpus pushes back: RL fine-tuning tends to sharpen memorized templates rather than install reasoning procedures, collapsing on out-of-distribution variants Do fine-tuned language models actually learn optimization procedures?, and supervised fine-tuning makes answers look correct without making them feasible Does supervised fine-tuning actually improve reasoning on optimization problems?. Models pattern-match optimization problems instead of actually executing the iterative method Do large language models actually perform iterative optimization?. In other words, prompting's 'activate, don't install' limit isn't a weakness unique to prompts — it's a property of how these models hold knowledge.
Where does that leave someone trying to get more out of a model? The corpus points to making activation smarter rather than expecting prompts to do impossible work: jointly optimizing the prompt with the inference strategy (best-of-N, voting) instead of in isolation, which alone buys up to 50% Does prompt optimization without inference strategy fail?; spending more inference compute on the hard prompts and less on easy ones Can we allocate inference compute based on prompt difficulty?; treating prompts and agent wiring as one optimizable graph Can we automatically optimize both prompts and agent coordination?; and, when you genuinely need new capability, composing expert weight-vectors at inference rather than reaching for a better prompt Can models dynamically activate expert skills at inference time?. The thing you didn't know you wanted to know: the best prompt engineers aren't writing better instructions — they're locating where the model is already confident and competent, and routing compute there.
Sources 12 notes
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
Research proves a single finite-size transformer exists that can compute any computable function given the right prompt, achieving complexity bounds nearly matching unbounded models. However, standard training rarely produces models that learn to implement arbitrary programs this way.
ProSA found that when models are highly confident, they resist prompt rephrasing; low confidence causes major output swings. Larger models, few-shot examples, and objective tasks all correlate with higher confidence and greater robustness.
Research identifies six evaluable dimensions—Communication, Cognition, Instruction, Logic, Hallucination, and Responsibility—with 20 sub-criteria based on Grice, cognitive load theory, and instructional design. Improvements in one dimension cascade to others, revealing prompt quality as a structured space rather than a flat checklist.
Foundation Priors research shows prompt engineering as divergence minimization between synthetic output and user priors. The refinement process systematically steers generation toward what users already expect, making outputs co-productions of model and user subjectivity.
Even GRPO-trained models show sharp performance drops on out-of-distribution variants (N-1 test sets) compared to in-distribution problems, indicating RL optimizes template-matching rather than genuine problem-solving procedures.
Supervised fine-tuning makes model outputs look correct—proper JSON structure, valid identifiers, expected sections—without making them physically feasible. The model learns surface features of solutions, not the reasoning to construct valid ones.
Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.
Prompts optimized without knowledge of the inference strategy (best-of-N, majority voting) systematically underperform. Joint optimization of both prompt and inference strategy yields up to 50% improvement across reasoning and generation tasks.
Research shows inference effectiveness varies dramatically by prompt difficulty. Reallocating the same total compute adaptively—giving easy prompts less and hard ones more—substantially outperforms larger models under uniform budgets.
Language agents represented as computational graphs—where nodes are operations and edges define information flow—reveal that CoT, ToT, and Reflexion are formally equivalent structures. This unified view enables automatic optimization of both node prompts and edge connectivity without manual redesign.
Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.