Post #895

@graphml

Graph Machine Learning

Vues3,620Nombre de vues

Publié25 mai25/05/2025 02:05

Contenu

Contenu du post

GraphML News (May 14th) - KumoRFM, Open Molecules 2025, TxPert Lots of news over the past two weeks other than new Gemini and Claude models! 🏆 KumoAI presented KumoRFM - the first graph foundation model for relational databases capable of zero-shotting node regression, node classification, and link prediction. Given any set of relational tables with any categorical or numerical features and transforming them into a graph, you can now zero-shot typical tasks like regression or classification. Perhaps the biggest difference of KumoRFM compared to other inductive models is using in-context learning, that is, for each prediction task we’d mine not only an ego-graph around the target entity, but also ego-graphs about relevant nodes with similar labels. The backbone for encoding ego-graphs for node-related tasks is the Relational Graph Transformer (another new pre-print), then graph vectors are aggregated by an attention pooling. Besides, KumoRFM has a built-in GNN Explainer to give some transparency to the decisions. Typical for AI labs, Kumo doesn’t disclose on which data KumoRFM was trained on, but they claim they zero-shot the whole RelBench which is a great achievement (albeit the results are slightly worse than their supervised ContextGNN). ⚛️ FAIR Chemistry, CMU, National Labs, Genentech, and a large scientific collab presented Open Molecules 2025 and Universal Model for Atoms - the largest dataset of simulations 100M molecules, biomolecules, complexes, MOFs with a plethora of properties to predict covering structures up to 350 atoms (10x larger than any other dataset). It took 6 billion CPU hours to complete the simulations 👀 OMol25 will probably be the main dataset to train the next gen of ML potentials - you can’t find a larger open source dataset anywhere else. As a baseline, FAIR prepared the Universal Model for Atoms based on the equivariant GNN (eSEN) with a mixture of experts (hello from the LLM world). 🦠Advancing Drug Discovery Outcomes with Virtual Cells by Valence Labs (Recursion) - introduces the data pipeline and computational platform for building a “virtual cell”. As a proof of concept, Valence trained TxPert, a model for predicting transcriptional responses to combinatorial genetic perturbations, that outperforms GEARS and scLAMDA. Besides, Valence put out a nice white paper on virtual cells with cool illustrations. Weekend reading (before we get into all new fancy NeurIPS submissions) - theory alert: Covered Forest: Fine-grained generalization analysis of graph neural networks by Antonis Vasileiou et al on generalization power of MPNNs. Graph Representational Learning: When Does More Expressivity Hurt Generalization? by Sohir Maskey et al - another relevant work messaging that fancy expressive GNN architectures might actually be pretty bad at OOD generalization. Some day GNN theory folks will discover that Attention is All You Need 🙂 Addressing the Scarcity of Benchmarks for Graph XAI by Michele Fontanesi et al - proposed a new method to automate the generation of Explainable AI benchmarks for graph classification, where at least one of the classes is explained by a specific sub-graph motif. Also bundles 15 new benchmarking tasks. Thanks Domenico Tortorella for the pointer.