INQUIRING LINE

What specific information must be exported from the language system?

This explores a question the corpus answers in surprisingly different ways depending on the task: when you pull information *out* of a language model to do something with it — personalize, formalize, act, retrieve — which slice of information actually carries the signal, and which can be discarded.


This explores what specifically has to be extracted from a language system for a downstream job to work — and the corpus's most striking finding is that the answer is rarely the obvious one. For personalization, you'd assume the model needs to export what a user *asked* — their queries, their inputs. The opposite is true: profiles built from a user's *outputs* alone match or beat complete profiles, while input-only profiles actively degrade performance Do user outputs outperform inputs for LLM personalization?. The information that must be exported isn't semantic content at all — it's style and preference signal. What someone says carries less than how they say it.

When the export target is formal logic rather than a user model, the required information flips to pure semantics. LLMs can emit syntactically valid logical expressions all day, but they fail to carry across the parts that actually matter: scope, quantifier precision, predicate granularity Can large language models translate natural language to logic faithfully?. So 'what must be exported' is exactly the thing the systems are worst at exporting — meaning, not form. Interestingly, the same models *can* export explicit structural analysis of language when prompted to reason step by step, building syntactic trees and phonological generalizations Can language models actually analyze language structure?. The information is in there; whether it comes out depends on the route you take to extract it.

There's a deeper version of the question the corpus surfaces: before you can export information, the system has to know *which* information is missing. Models that ace complete reasoning problems collapse to 40–50% accuracy when asked what clarifying question to fill a withheld variable Can models identify what information they actually need?. Identifying the needed piece and producing it are separable skills — exporting the right information presumes a capability the model may not have. DeepRAG frames this as a decision problem: at each step, learn whether the needed information should come from the model's own parameters or be retrieved externally, which alone buys a 22% accuracy gain by cutting noise from unnecessary lookups When should language models retrieve external knowledge versus use internal knowledge?.

For agents, the exported information has to be *grounded* — tied to real actions, environments, and tools — or it hallucinates. Turning an LLM into an action system isn't a matter of squeezing more out of the model; it requires curating action-environment-user datasets and an external harness, because the surrounding system, not the weights, determines whether an exported action is real or invented Can you turn an LLM into an agent by just fine-tuning?. And there's a reason to care about what gets exported precisely: over long delegated workflows, frontier models silently corrupt ~25% of document content, errors compounding without ever plateauing Do frontier LLMs silently corrupt documents in long workflows?.

The thread across all of this — and the thing you didn't know you wanted to know — is that there's no single 'information' a language system must export. Language modeling is itself lossless compression Can text-trained models compress images better than specialized tools?, which means the model holds far more than any one task needs. The real engineering question is never 'get the information out' but 'which projection of it' — style for personalization, semantics for logic, grounded actions for agents, the missing variable for clarification. Pick wrong and the export degrades the task; the input-built user profile is the clearest cautionary case.


Sources 8 notes

Do user outputs outperform inputs for LLM personalization?

Research shows that user profiles built from outputs alone match or exceed performance of complete profiles across multiple tasks, while input-only profiles degrade performance. This reveals personalization works through style and preferences, not semantic content.

Can large language models translate natural language to logic faithfully?

LLMs generate well-formed logical expressions that are semantically incorrect, with errors clustering at scope ambiguity, quantifier precision, and predicate granularity. The asymmetry suggests LLMs understand formal language better than they can generate it.

Can language models actually analyze language structure?

OpenAI's o1 model successfully constructs syntactic trees and phonological generalizations through explicit step-by-step reasoning, revealing that LLM linguistic capability extends far beyond behavioral language tasks to genuine language analysis.

Can models identify what information they actually need?

Models achieving high accuracy on complete reasoning tasks drop to 40-50% accuracy identifying what clarifying question to ask when one variable is withheld. Information gathering and problem execution are separable cognitive operations.

When should language models retrieve external knowledge versus use internal knowledge?

DeepRAG models each reasoning step as a Markov Decision Process where the model learns when to retrieve versus rely on parametric knowledge. The 21.99% improvement comes from better-targeted retrieval and elimination of noise from unnecessary external knowledge.

Can you turn an LLM into an agent by just fine-tuning?

Converting LLMs to action-capable systems requires four distinct stages: curating action-environment-user datasets, training for action grounding, integrating agent infrastructure with memory and tools, and rigorous safety evaluation. The surrounding system and harness determine whether actions are grounded or hallucinated.

Do frontier LLMs silently corrupt documents in long workflows?

Testing 19 models across 52 domains shows even advanced systems degrade documents by ~25% over extended relay tasks, with errors compounding silently without plateauing through 50 round-trips.

Can text-trained models compress images better than specialized tools?

Chinchilla models trained exclusively on text achieve better compression rates on images and audio than FLAC and PNG by using their context window to adapt as task-specific compressors. This demonstrates that generalization operates through compression, not specialization.

Next inquiring lines