Attention Mechanisms Perspective: Exploring LLM Processing of Graph-Structured Data
Attention mechanisms are critical to the success of large language models (LLMs), driving significant advancements in multiple fields. However, for graph-structured data, which requires emphasis on topological connections, they fall short compared to message-passing mechanisms on fixed links, such as those employed by Graph Neural Networks (GNNs). This raises a question: “Does attention fail for graphs in natural language settings?” Motivated by these observations, we embarked on an empirical study from the perspective of attention mechanisms to explore how LLMs process graph-structured data. The goal is to gain deeper insights into the attention behavior of LLMs over graph structures. We uncovered unique phenomena regarding how LLMs apply attention to graph-structured data and analyzed these findings to improve the modeling of such data by LLMs. The primary findings of our research are: 1) While LLMs can recognize graph data and capture text-node interactions, they struggle to model inter-node relationships within graph structures due to inherent architectural constraints. 2) The attention distribution of LLMs across graph nodes does not align with ideal structural patterns, indicating a failure to adapt to graph topology nuances. 3) Neither fully connected attention nor fixed connectivity is optimal; each has specific limitations in its application scenarios.
Q1: Do the attention distribution of LLMs change before and after training with finetuning ? Can LLMs correctly utilize graph structures? We first compared the distribution curves of attention scores for node tokens and text tokens before and after LLMs training. We then conducted hypothesis testing to clarify the following points: whether the attention on node tokens has shifted; whether the attention on text tokens has shifted; and whether the attention distributions between node tokens and text tokens are consistent. The results showed that after training, the LLMs’ attention towards node tokens indeed underwent a significant shift. This suggests that the LLMs have developed an initial capability to recognize graph-structured data. Simultaneously, our hypothesis testing revealed that the attention distribution of LLMs within nodes exhibits extreme tendency. However, in subsequent experiments where we disrupted the connectivity information, we found that even when the topological connection information was randomly shuffled, it had almost no effect on the LLMs’ performance. This indicates that the LLMs did not effectively utilize the correct connectivity information.
Q2: Can LLMs allocate attention to different types of graph nodes in a manner consistent with the structural properties of the graph? When processing graph data inputs, LLMs calculate attention scores between node tokens to weigh the importance of different nodes relative to each other. Through our visualization experiments, we found that the attention scores between different node tokens in LLMs do not adequately match the graph structure. Specifically, under sequential conditions, the attention distribution of node tokens exhibits a U-shaped or long tail, which deviated from our idealized assumptions. Ideally, the model should focus more on central nodes and allocate attention in a hierarchical, diminishing manner.