Contenu du post
Graph ML News (Oct 7th) - FoldFlow, Iambic round, Google’s Graph Mining Library Although ICLR submissions are still not available, October brings some other news! 🌊 Flow Matching is the generative modeling framework of 2023 (and you’ll hear it everywhere in 2024) that is taking the Geometric DL world step by step. While diffusion models can only generate from a Gaussian prior, Flow Matching generative models can take any prior distribution. The seminal paper by Alex Tong et al made huge advancements in the Continuous Normalizing Flows, conditional flow matching, and optimal transport for flow matching (here is the LoGG reading group talk), and we’ll see a good bunch of generative models for molecules and proteins based on this framework. A few days ago, the DreamFold team from Mila led by Joey Bose and Tara Akhound-Sadegh together with Michael Bronstein (and with Alex Tong) released FoldFlow, an SE(3) equivariant flow matching model for protein backbone generation. Perhaps the coolest result is in the attached figure - whereas AlphaFold 2 can only discover one energy state of the protein structure, FoldFlow captures all modes of the distribution which increases diversity of generated samples. Hope to hear more from folks at DreamFold in future! 🧬 Iambic Therapeutics (former Entos) raised $100M Series B to advance their drug discovery platform. Iambic identified 2 drug candidates (apparently preliminary trials look ok) and is active in the academic environment, ie, the team created OrbNet and recent NeuralPlexer, an equivariant diffusion model for protein-ligand docking. ⛏️ Google open-sourced the Graph Mining library in C++ with scalable and parallel graph clustering algorithms including the recent ParHAC from NeurIPS’22 that processed a 154 billion edges graph in 3 hours. No graph is too large for Google. 🍧 Floris Geerts (University of Antwerp) gave a Richard M. Karp distinguished lecture at the Simons institute on “The Power of Graph Learning” focusing on theoretical aspects of GNNs expressiveness, and explaining the idea of Graph Embedding Language (GEL) that bridges a gap between GNNs and databases. While the GEL paper is in the works, there is a nice slide deck about it. Weekend reading: SE(3)-Stochastic Flow Matching for Protein Backbone Generation - FoldFlow Equivariant flow matching by Leon Klein, Andreas Krämer, Frank Noé - to complete the equivariant flow matching picture SaProt: Protein Language Modeling with Structure-aware Vocabulary by Su et al - pretty massive gains over ESM-2 in the protein structure awareness, 650M params trained on 64 A100 for 3 months. The code is already available on GitHub Cooperative Graph Neural Networks by Ben Finkelshtein et al - a new look at the message passing procedure where nodes can “decide” whether to propagate neighbors messages, send own messages, or remain silent.