Contenu du post
Combining Label Propagation and Simple Models Out-performs Graph Neural Networks This paper by Cornell and Facebook made a lot of noise on Twitter recently. In short, it shows that GNNs can be outperformed by simpler models such as MLP + Label Propagation (LP) on several large datasets. They use LP (actually twice) to propagate the labels from training nodes to test nodes. LP has been used for two decades successfully (NIPS 2004 as well as this survey), it's just it was not directly compared to GNN. Unfortunately, LP does not use node features, so the authors propose first to use MLP on node features and then use LP on predictions of MLP and on labels. This work only applies for transductive node classification, but not on inductive node classification (applying trained model on new graphs), neither on link prediction nor graph classification. But for node classification it shows pretty good results in terms of speed and quality. Another detail is that LP usually works on homophilous graphs, i.e. graphs where nodes with the same labels have higher chance of being connected. While this assumption is reasonable, not all graphs have this type of connectivity, for example the mail that goes from a person to a post office to aggregator to the recipient may connect nodes of different classes together. Petar Veličković talks more in detail about this. I must add that it's not the first time we see that existing graph datasets can be outperformed by simple models. A year ago there were many works showing that MLP works better than GNN on many graph classification datasets (e.g. this paper). MLP don't work on OGB datasets really well, but MLP + LP does. Hopefully it will lead to more graph datasets and subsequently to more insights about which tools are the best for graph prediction problems.