Post #684

@graphml

Graph Machine Learning

Vues3,100Nombre de vues

Publié27 juin27/06/2022 09:00

Contenu

Contenu du post

Monday Special: Learnable Neural Priority Queues in GNNs Message passing propagates messages to all nodes in a neighborhood. Messages might be weighted with a fixed normalization constant (like in GCNs), or with a learnable scalar (GAT), or with a composition function over node and edge features (MPNN). Still, you’d send messages to all neighboring nodes. Some neighbor samplers (like the one in classic GraphSAGE) allow to subsample K nodes in a neighborhood to send messages to, but, still, all those K subsampled nodes receive the message. Is there a way to somehow guide a GNN and send messages only to a fraction of neighbors and retain the same performance? This could also potentially save a lot of computation, e.g., when propagating through hub nodes. How do we select those important neighbors then? Looking at the classical graph search algorithms like Dijkstra and A*, we employ priority queues that essentially rank edges according to a certain heuristic, then we take the top ranked edge, add it to the path, and continue further. Can we use something similar for GNNs? Recently, a few fresh mid-2022 works proposed to learn priority queues explicitly or implicitly: - Learning to Efficiently Propagate for Reasoning on Knowledge Graphs by Zhu et al. propose A*Net and a neural priority function. Essentially, we construct an edge index dynamically at each layer of a GNN starting from the “root” node. The priority function takes representations of a current node and edge feature, and produces a sigmoid distribution over neighboring nodes and edges from which we select top-K, add to the edge index, and then perform message passing. The strategy brings 5-7x reductions in the number of computed messages and 2-5x reductions in the GPU RAM (depending on the graph). - Learning Adaptive Propagation for Knowledge Graph Reasoning by Zhang et al. propose AdaProp that builds the edge index dynamically as well. At each layer, AdaProp still computes all messages in the neighborhood, and then applies the differentiable Gumbel-TopK trick with the Straight-Through Estimator to select K edges. Those edges are added to the edge index for the next message passing layer. AdaProp does not save as many messages as A*Net but converges somewhat faster. (And should be more difficult to train due to noisy Gumbel-TopK + STE). - Learning heuristics for A* by Numeroso et al. approach the A* problem from the Neural Algorithmic Reasoning viewpoint with the Encoder-Process-Decoder architecture. Instead of dynamically building the edge index, they attach an additional decoder to predict heuristic values (along with the standard node state predictor), and add a regularization term for possibly unbounded predicted heuristic values. Here, we still compute all messages during training, but can perform inference faster based on the predicted heuristics. Check the papers for more details - definitely worth the time! Illustration: saved messages in the A*Net.