Post #754

@graphml

Graph Machine Learning

Vues4,090Nombre de vues

Publié9 févr.09/02/2023 21:11

Contenu

Contenu du post

Attending To Graph Transformers by Luis Müller, Michael Galkin, Christopher Morris, and Ladislav Rampasek arxiv Our new survey on Graph Transformers (GTs) adjoined by some “mythbusting”. We come up with categorization of GTs according to 4 main views: 🗺️ used Encodings, 🌐 expected Input Features (geometric or non-geometric), ✨ Tokenization (nodes, nodes+edges, subgraphs), and 🧮 Propagation (fully-connected, sparse, hybrid). We investigate 4 common expectations and claims about GTs. Although conclusions are more nuanced (see the paper), we label them with pretentious badges ✅ Confirmed / ❌ Busted / 🤔 Plausible 1️⃣ Are GTs theoretically more expressive than GNNs? ❌ Busted. There is no inherent property of GTs that makes them more expressive. Instead, their expressivity stems from their positional/structural encodings. (And making those maximally expressive is as hard as solving the graph isomorphism problem.) 2️⃣ Can graph structure be effectively incorporated into GTs? ✅ Confirmed. GTs can identify graph edges (easy task), count triangles (medium), and distinguish regular graphs (hard task). But there is still room for improvement. 3️⃣ Does global attention reduce over-smoothing? 🤔 Plausible. In heterophilic graphs, GTs clearly outperform vanilla GNNs but still lag behind specialized SOTA models. Maybe we need a different structural bias? 4️⃣ Do GTs alleviate over-squashing better than GNN models? 🤔 Plausible. The Transformer perfectly solves NeighborsMatch where GNNs struggle. However, this is a synthetic “retrieval” task that doesn’t test (sub)graph representation. 🎁 Bonus: Attention matrices contain meaningful patterns and explain GT performance. ❌ Busted. We couldn’t find any strong interpretability of attention scores for downstream tasks. We suggest following Bertology in NLP that moved from dissecting attention to designing benchmarks.