Can language models actually use graph structure information?
After fine-tuning on graph data, do LLMs learn to use actual connectivity patterns, or just recognize that graphs exist? This matters for understanding whether transformers can handle structured reasoning tasks.
Empirical analysis of how LLMs process graph-structured data through attention mechanisms reveals a striking dissociation: after fine-tuning, LLMs develop significant attention shifts toward node tokens (demonstrating initial graph data recognition), but when topological connection information is randomly shuffled, performance is almost unaffected. The model recognizes that graph data exists but doesn't actually use the structural relationships.
Three specific findings:
Recognition without utilization: Post-training attention toward node tokens shifts significantly, but this recognition doesn't translate into structural understanding. The model attends to nodes as a category without tracking their connections.
U-shaped attention distribution: When processing graph nodes, LLMs distribute attention in a U-shaped or long-tail pattern (attending to first and last nodes) rather than the structurally ideal pattern of focusing on high-centrality nodes with hierarchical diminishment. This is the same sequential bias that attention mechanisms show for text.
Neither fully connected nor fixed connectivity is optimal: The analysis shows that both extremes — attending to everything equally or attending only along fixed graph edges — have specific limitations. This suggests graph reasoning requires a middle ground that current attention mechanisms don't naturally produce.
The implication: transformer attention is structurally biased toward sequential processing patterns, and this bias persists even when the input is graph-structured data that requires topological reasoning. Message-passing mechanisms (as in GNNs) remain fundamentally better suited for inter-node relationship modeling.
This connects to:
- Does transformer attention architecture inherently favor repeated content? — the same positional/sequential bias that creates sycophancy also prevents graph topology processing
- Why do decoder-only models underperform as text encoders? — causal attention is doubly limited: it constrains both encoding quality and graph structure processing
- Can reasoning topologies be formally classified as graph types? — GoT assumes LLMs can reason in graph structures, but this evidence suggests the attention mechanism fundamentally resists graph-native processing
Original note title
LLM attention recognizes graph data after training but fails to model inter-node relationships — shuffled connectivity has no effect on performance