Contenu du post
Model distillation for GNNs Model distillation is the approach to train a small neural network called student given a large pretrained neural network called teacher. Motivation for this is that you want to reduce the number of parameters of your production model as much as possible, while keeping the quality of your solution. One of the first approaches for this was by Geoffrey Hinton, Oriol Vinyals, Jeff Dean (what a combo) who proposed to train student network on the logits of the teacher network. Since then, a huge amount of losses has appeared that attempt to improve performance of student network, but the original approach by Hinton et al. still works reasonably well. A good survey is this recent one. Surprisingly, there were not many papers on model distillation for GNNs. Here are a few examples: * Reliable Data Distillation on Graph Convolutional Network SIGMOD 2020 * Distilling Knowledge from Graph Convolutional Networks CVPR 2020 * Extract the Knowledge of Graph Neural Networks and Go Beyond it: An Effective Knowledge Distillation Framework WWW 21 But these approaches were not convincing enough for me to say that knowledge distillation is solved for GNNs, so I'd say it's still an open question to research. I have also tried to train MLP model on GNN logits to see if we can replace GNN with MLP at inference time, and apparently you can get an uplift wrt vanilla MLP trained on targets; however, the performance is not as good as for GNNs. One of the good examples of significantly reducing the number of parameters of GNNs is the recent work on LP for node classification: LP has 0 parameters and with C&S it gets some MLP parameters but not as many as for GNNs.