Contenu du post
On the Connection Between MPNN and Graph Transformer Chen Cai, Truong Son Hy, Rose Yu, Yusu Wang from UCSD Invited post by Chen Cai Our work aims to understand the relationship between local GNN (MPNN) and global GNN (Graph Transformer). Here local GNN refers to MPNN that mix node features locally and global GNN refers to the graph transformer, which incorporates graph topology into positional embeddings (along with existing node/edge features) and send a set of feature vectors into a vanallia transformer. 1️⃣ Local MPNN and global Graph Transformer are two major paradigms of graph learning. MPNN comes early and encompasses several popular architectures such as GCN, GraphSage, and GAT. However, they are known to suffer from issues like limited expressive power, over-smoothing, and over-squashing. Graph Transformer, on the other hand, received a lot of attention recently and shows competitive performance on standard benchmarks like OGB and tasks that require long-ranging reasoning. Our goal is to gain a fine-grained understanding of the relationship between such two paradigms. 2️⃣ Previously, 1 showed that with specific positional embedding, GT can approximate 2-IGN Invariant Graph Networks, which is at least as expressive as MPNN. This work establishes an important link from global GNN to local GNN. 3️⃣ In our work, we tried to establish the link from the inverse direction: can MPNN approximate GT as well? We looked into a very simple way to incorporate global modeling into the local mixing of MPNN: virtual node (VN). MPNN + VN adds a virtual node and connects VN to all the graph nodes and then does (heterogenous) message passing on the modified graph. It is a heuristic proposed in the early days of GNN research and shows consistent improvement over MPNN. However, there is little theoretical understanding of MPNN + VN. 4️⃣ We have a set of approximation results of MPNN + VN on GT. 1) for constant depth and width MPNN + VN (O(1) depth and width), it can approximate self-attention layers of two “linear transformers”, Performer & Linear Transformer. 2) for wide MPNN + VN (O(1) depth O(n^d) width), we show it can approximate full GT via a link to equivariant Deepsets. 3) For deep MPNN + VN (O(n) depth O(1) width), it can approximate one self-attention layer. 5️⃣ On the experimental side, we pushed the limit of simple MPNN + VN and showed that 1) it works surprisingly well on LRGB (long-range graph benchmark) dataset where previously GT dominates. 2) leveraging the GraphGPS framework, our MPNN + VN improves over the previous implementation 3) MPNN + VN outperforms Linear Transformer and MPNN on the climate modeling task. See our paper and code for more details! Pure Transformers are Powerful Graph Learners