TGINSIGHT CHAT
Graph Machine Learning
@graphml
TechnologiesEverything about graph theory, computer science, machine learning, etc. If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo. Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Posts récents
Page 12 sur 74 · 877 posts
Publié 13 mai
Graph ML News (May 13th) - $100M Wall Street Edition 💸 Perhaps the biggest news of the month: Recursion, a major player in drug discovery, acquires two startups: Valence Discovery (Mila, Montreal) for $47.5M and Cyclica (Toronto) for $40M. Not pretending to wear a Wall Street market analyst hat, I’d speculate those are the biggest M&A deals of the past years in the Graph ML industry. Graph ML and Geometric Deep Learning are at the core of modern drug discovery powering pretty much all stages of the pipeline reducing the time to market from standard 10 years by a factor of 2-3x. I happen to know many smart folks from Valence Discovery including Prudencio Tossou, Therence Bois, and Dominique Beaini with whom we co-authored a few papers for NeurIPS’22. Valence also supports the most popular public reading groups on Graph ML: Learning on Graphs and Geometry (LOG2) and Molecular Modeling & Drug Discovery (M2D2) covering hot new papers with original authors. Big congratulations to the team and hope we’ll see more cool stuff in the future! With the Wall Street hat on, I’d hypothesize the next big wave of investment rounds and huge M&As would be in the material discovery and AI4Science fields where Geometric DL is at the core either. Venues: ECML PKDD’23 in Torino published the list of accepted workshops - have a look at the Mining and Learning with Graphs (MLG) workshop featuring keynotes from Bastian Rieck and Giannis Nikolentzos. Bastian gives amazing talks on topology, highly recommend to attend if you are at ECML this year. Paper submission deadline is June 12th, consider submitting as well. Weekend reading: Alex Barghi wrote a blogpost introducing the new cuGraph backend of PyG covering new accelerated primitives, feature store, and neighbor sampling using node classification on the MAG graph as example. Zhaocheng Zhu posted a viral tweet with the Colab Notebook comparing PyTorch and JAX performance of common GNN operators. Key takeaways are: JAX with JIT is faster than PyTorch on homogeneous graphs, and much faster and memory-efficient on larger heterogeneous graphs when PyTorch throws OOM; new torch.compile() often makes the code 2x faster than vanilla torch, so make sure to update your envs to torch 2.0 🚀 New papers for the weekend reading: Language models can generate molecules, materials, and protein binding sites directly in three dimensions as XYZ, CIF, and PDB files by Daniel Flam-Shepherd and Alan Aspuru-Guzik - “In this work, we show how language models, without any architecture modifications, trained using next-token prediction - can generate novel and valid structures in three dimensions from various substantially different distributions of chemical structures.” Sparks of chemical intelligence 👀 Advancing structural biology through breakthroughs in AI by Laksh Aithani and folks from Charm Therapeutics - a nice introductory survey how (Geometric) DL transforms structural biology.
Publié 6 mai
Graph ML News (May 6th) ICLR’23 has finished this week, to those who travelled to Kigali - have a safe trip back 🙂 Meanwhile, you might have missed the ICLR Blogposts Track - a collection of insightful articles for which it is often more handy to express the content as a blog post rather than a full paper. Particularly interesting are On Universality of Neural Networks on Sets vs Graphs (by Fabian B. Fuchs and Petar Veličković), on Neural PDE Solvers (by Yolanne Lee), and Thinking Like Transformer (by Alexander Rush, Gail Weiss). I would generally recommend submitting there (my post was accepted at ICLR’22 Blog Post Track) - it was a pleasant experience and you also do some community serving writing about your research. A few upcoming events: LoG Paris Meetup on June 14th in Paris at CentraleSupélec, Université Paris-Saclay with the keynote from Michael Bronstein. Michael is going to be one of the keynote speakers at ECML PKDD 2023 in September in Torino - the list of accepted workshops should appear soon, so far we know about the Workshop on Learning and Mining with Blockchains. If you fancy Lisboa in September, you might want to submit to the Special Track on AI on Networks for Social Good, part of the ACM Conference on Information Technology for Social Good. Thanks to Manuel Dileo for the pointers 👏 For the weekend reading, have a look at: Language Models Enable Simple Systems for Generating Structured Views of Heterogeneous Data Lakes by Simran Arora and Christopher Ré’s lab When Do Graph Neural Networks Help with Node Classification: Investigating the Homophily Principle on Node Distinguishability by Sitao Luan feat. Jure Leskovec and Doina Precup An Exploration of Conditioning Methods in Graph Neural Networks by Yeskendir Koishekenov and Erik J. Bekkers
Publié 29 avr.
Graph ML News (April 29th) - Upcoming ICLR andAccepted ICML papers ICLR in Kigali starts next week! There is going to be a flurry of materials and reviews prepared by small and big labs, for instance, A Guide to ICLR 2023 — 10 Topics and 50 papers you shouldn't miss - so we’ll try to keep you updated. Meanwhile, the Machine Learning for Drug Discovery (MLDD) and ML4Materials workshops announced accepted papers - those are nice venues to see where the community moves and what would be next major conference submissions. More resources on topology: 🍩Database of Original & Non-Theoretical Uses of Topology (DONUT) - a collection of TDA applications beyond machine learning. TopoEmbedX - a python library for working with topological data, pretty much networkx for higher-order structures. Following that, a fresh talk on the Curvature for Graph Learning by Bastian Rieck! Finally, ICML acceptances have arrived - some particularly interesting preprints that made it to the conference include: - Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure - STRIDERNET: A Graph Reinforcement Learning Approach to Optimize Atomic Structures on Rough Energy Landscapes - MoleculeSDE - A Group Symmetric Stochastic Differential Equation Model for Molecule Multi-modal Pretraining (a project website so far) - GREAD: Graph Neural Reaction-Diffusion Networks - On the Expressive Power of Geometric GNNs - Improved Graph HyperNetwork (GHN-3)
Publié 23 avr.
GraphML News (April 23rd) - Topological Deep Learning, Scalable Molecular Simulations, Network Games Architectures of Topological Deep Learning: A Survey on Topological Neural Networks by Mathilde Papillon, Sophia Sanborn, Mustafa Hajij, and Nina Miolane - a wonderful survey on Topological Deep Learning explaining basic concepts from sets and graphs to simplicial and cellular complexes using message passing framework. The survey also covers prominent deep learning architectures employing topological features and tasks that benefit from them. Must read 👍 Scaling the leading accuracy of deep equivariant models to biomolecular simulations of realistic size by Albert Musaelian, Anders Johansson, Simon Batzner, Boris Kozinsky - the work introduces Allegro v2, an improved version of the SOTA equivariant model Allegro, probed on the humongous problem scale: nanoseconds of the full HIV capsid (44M atoms) and scaling up to 100M atom structures on 5120 A100 GPUs 👀. New blogs: Michael Bronstein and Emanuele Rossi wrote an article on Learning Network Games - an intersection of the game theory and Graph ML. The main task is to infer the network structure between the agents in a game based on the observations of actions and outcomes. Not directly about graphs, but Shashank Prasanna wrote an intro to torch.compile() introduced in PyTorch 2.0 and what’s happening under the hood when you execute it on your model.
Publié 16 avr.
GraphML News, April 16th edition - Generalist Medical AI, more diffusion papers No particularly outstanding Graph ML event or announcement (that we hadn’t covered before) happened this week, so here is a collection of fresh papers you might want to have a look at: Foundation models for generalist medical artificial intelligence - perhaps a landmark paper on using foundation models and many its exciting applications like generative models (eg, text-to-molecule or text-to-protein) in real world medicine. DiffDock-PP: Rigid Protein-Protein Docking with Diffusion Models - extension of the famous DiffDock that translates and rotates unbound protein structures into their bound conformations. Graph Generation with Destination-Driven Diffusion Mixture - the next version of the score-matching GDSS generative model (ICML 2022). Here, the model learns to “keep in mind” the final destination of the diffusion process at each time step - this trick greatly improves the performance in 2D and 3D tasks. DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization - turns out discrete diffusion on graphs is able to generate very strong priors for combinatorial optimization tasks like Traveling Salesman or Maximum Independent Set when paired with a postprocessing solver. GraphGUIDE: interpretable and controllable conditional graph generation with discrete Bernoulli diffusion - another take on discrete diffusion on graphs where authors define Bernoulli noising process as adding/removing/flipping edges instead of marginal transition probabilities mined from data (like in DiGress). Strength of that approach is that any intermediate state with added noise is still a legit graph retaining its sparsity instead of adding direct noise to node features or adjacency matrix.
Publié 8 avr.
Graph ML News, April 8th edition - MoML’23, GLB’23, and more Molecular Machine Learning Conference (MoML) 2023 is going to take place at Mila in Montreal on May 29th. MoML is the premier venue for ML applications in drug discovery, quantum chemistry, molecular dynamics, and protein design. Confirmed speakers are Yoshua Bengio (Mila), Djork-Arné Clevert (Pfizer), Marinka Zitnik (Harvard), Gregory Bowman (UPenn), Mohammed AlQuraishi (Columbia), and Dominique Beaini (Mila, Valence Discovery). Posters submission deadline is April 24th, The ‘22 event was held at MIT and was a huge success! In this context, University of Amsterdam (UvA) announced 4 open postdoc positions in the new program on AI 4 Molecules & Materials. The Workshop on Graph Learning Benchmarks (GLB’23) will be held in conjunction with KDD 2023 in Long Beach (California) on Aug 7th. Submit your works on new graph datasets, benchmarks, and software until May 26th. The workshop is non-archival. PyG expands the range of supported hardware to Graphcore IPUs with examples on training temporal GNNs, molecular property prediction GNNs, and inductive KG reasoning GNNs on IPUs. Following up on that, you might want to attend the GNN meetup organized by Graphcore and Kumo in London on April 13th next week. For the weekend reading, check out EigenFold: Generative Protein Structure Prediction with Diffusion Models by Bowen Jing, Ezra Erives, Peter Pao-Huang, Gabriele Corso, Bonnie Berger, and Tommi Jaakkola. The take on protein tasks by the authors of DiffDock 😉
Publié 1 avr.
Graph ML News, April 1st edition Apart from Neural Graph Databases and Twitter Algorithm (and SIGBOVIK), a few more things happened this week. The Learning on Graphs Conference (LoG) 2023 has been announced! One of the most premiere graph learning venues is going to take place online on Nov 27-30th accompanied by local meetups, you can actually volunteer and organize it at your place! Baker Lab open-sourced RF Diffusion, a SOTA protein generation model, as part of ColabFold. We covered RF Diffusion a few months ago and its capabilities are quite astounding. Since the time of announcement, the authors further improved the quality and managed to test hundreds of generated proteins in the wet lab to test their properties. ICML 2023 announced accepted workshops - the graph learning audience might want to attend: - Differentiable Almost Everything: Differentiable Relaxations, Algorithms, Operators, and Simulators - Topology, Algebra, and Geometry in Machine Learning (TAG-ML) - Knowledge and Logical Reasoning in the Era of Data-driven Learning - Sampling and Optimization in Discrete Space - The Synergy of Scientific and Machine Learning Modelling (SynS & ML) - Workshop on Computational Biology - Structured Probabilistic Inference and Generative Modeling Rishi Puri and Matthias Fey published a post on accelerating Heterogeneous Graph Transformers in pyg-lib resulting in about 3x speed boost. Meanwhile, AWS Labs released GraphStorm, a Graph ML framework for enterprise use-cases based on DGL. For the weekend reading, check out Machine Learning for Partial Differential Equations by Steven L. Brunton and J. Nathan Kutz - perhaps the best intro into ML with PDEs. Yes, it is from the author of awesome YouTube lectures on dynamical systems, physics-inspired ML, and control theory.
Publié 31 mars
🐦Special: Graph algorithms behind The Twitter Algorithm Twitter has recently published some details on their tweet recommendation algorithm (denoted as The Algorithm). Let’s dive into it from the graph learning perspective - it does have some interesting features spanning clustering, KG embeddings, ANN, and PageRank. Data-wise, the GraphJet framework operates on the Twitter interaction graph (in-memory) supporting dynamic edge updates and lookup queries. Several algorithms prepare features: - Graph clustering based on sparse binary factorization (SBF) to mine communities, and then the SimClusters approximate nearest neighbor search library to query for the most similar clusters. There are approximately 145k communities on Twitter and they are updated every few weeks. - Twitter Heterogeneous Information Network (TwHIN) embedding - this is largely based on the classic TransE for knowledge graph embedding. The KG is a multi-relational graph among Users, Tweets, Ads, and Advertisers. TwHIN learns shallow embeddings for all nodes. For inductive capabilities — building embeddings for newly arrived tweets or users — the model simply aggregates embeddings of neighboring nodes (my 2 cents - NodePiece would fit pretty well into this setup). - RealGraph models the user interactions graph and outputs the likelihood of two users’ interaction. There is a relatively straightforward logistic regression model for edge scoring on top of the RealGraph. - TweepCred - a PageRank score for users, this is your “influencer” score. In your feed, 50% of tweets come from your network (RealGraph features), 50% from out-of-network (SimCluster, TwHIN, and Social Graph traversals). 1500 candidates are sent to the ranking models: a lightweight logreg and heavier 48M-param neural net based on MaskNet. Ranked candidates are subject to filtering and postprocessing. Overall, the recommender pipeline runs about 5 billion times a day, so the latency requirements do play a major role in selecting shallow’ish graph models. Check the repos for more details. We’ll leave other peculiarities like “the Elon feature” for other researchers 🙂
Publié 29 mars
Neural Graph Reasoning: Complex Logical Query Answering Meets Graph Databases Hongyu Ren, Mikhail Galkin, Michael Cochez, Zhaocheng Zhu, Jure Leskovec Our new work (65-pager 👀) on rethinking graph databases in the era of GNNs and neural reasoners where we explore the concept of Neural Graph Databases (NGDBs). 1️⃣ Why do we need NGDBs and what do current graph DBs lack? The biggest motivation is incompleteness - symbolic SPARQL/Cypher-like engines can’t cope with incomplete graphs at scale. In fact, in some cases, SPARQL reasoners might run indefinitely. Neural graph reasoning, however, is already mature enough to work in large and noisy incomplete graphs. 2️⃣ What are NGDBs? While their architecture might look similar to traditional DBs, the essential difference is in ditching symbolic edge traversal and answering queries in the latent space (including logical operators). Broadly, NGDBs are equipped to answer both “what is there?” and “what is missing?” queries whereas standard graph DBs are limited to traversal-only scenarios assuming the graph is complete. 3️⃣ In the NGDB framework, we create a taxonomy and survey 40+ neural graph reasoning models that can potentially serve as Neural Query Engines under 3 main categories: Graphs (theory and expressiveness), Modeling (graph learning), and Queries (what can we answer). 4️⃣ Finally, we outline a handful of key challenges and open problems in the area of Graph ML + Databases and for NGDBs in particular. Lots of cool stuff to work on! (especially if you are in an existential crisis after GPT-4, eg, designing LLM interfaces for NGDBs and how to let NGDBs improve structure, compress and accelerate LLMs are also promising directions) There is much more to tell about this work so we prepared more resources to learn about NGDBs: 📚blog post with a gentle intro and images 📜arxiv preprint 🛠️github repo with the taxonomy and curated list of relevant papers
Publié 25 mars
GraphML News, March 25th edition Some news you might have missed in the graph learning area after the week of massive AGI claims and GPT plugins announcement. ICLR 2023 announced Outstanding Papers - great to see two GNN papers there! One Outstanding Award went to Rethinking the Expressive Power of GNNs via Graph Biconnectivity, an honorable mention went to Conditional Antibody Design as 3D Equivariant Graph Translation. New releases of the main graph libraries: - PyG announced 2.3.0 with the full PyTorch 2.0 support where scatter and sparse APIs are now parts of the main torch, so you might expect less hassle installing PyG dependencies now. Besides, new torch.compile() brings 2-3x speed improvements for many common GNN architectures. - DGL presented a new version 1.0 at the recent LoGaG reading group, the video recording is already available. The new version introduces a new sparse API and further scalability improvements. New papers for the weekend reading: A Survey on Oversmoothing in Graph Neural Networks by T. Konstantin Rusch, Michael Bronstein, and Siddhartha Mishra - everything you wanted to know about known sources of oversmoothing and ways to alleviate it - including the recent Gradient Gating framework we reviewed a while ago. Zero-shot prediction of therapeutic use with geometric deep learning and clinician centered design by Kexin Huang, Payal Chandak, et al - introduces TxGNN, a pre-trained GNN for identifying therapeutic opportunities for diseases with limited treatment options (and completely new diseases in the zero-shot manner).
Publié 18 mars
GraphML News GPT-4 made the graph community scratching their heads as well (maybe not as much as academic NLP researchers) - look at the molecule search example at the very end of the technical report. Andrew White was among the few researchers working on this example, he compiled a thread how GPT-4 empowered with external tools can do a very impressive job proposing new molecules. Minkai Xu delivered a lecture “Geometric Graph Learning From Representation to Generation” as a part of the cs224w ML with Graphs course at Stanford (perhaps the most famous class about Graph ML). The lecture covers the basics of invariant and equivariant GNNs and introduces GeoDiff, a diffusion model for generating 3D molecules. Slides of the whole Winter’23 course are now available. Weekend reading: The Descriptive Complexity of Graph Neural Networks - a massive 88-pager from Martin Grohe proving that GNNs fall into the TC0 complexity class. This is a potential breakthrough since many database query languages fall into AC0 and TC0. Zero-One Laws of Graph Neural Networks by Adam-Day et al. - shows an interesting result that GCN-like MPNNs with random features map final graph representations to zeros or ones with the growing size of graphs. GATs and GINs are not (yet) prone to this behavior. Allegro-Legato: Scalable, Fast, and Robust Neural-Network Quantum Molecular Dynamics via Sharpness-Aware Minimization by Ibayashi et al - an improved version of Allegro, current SOTA in molecular dynamics simulations, with faster convergence and better stability.
Publié 11 mars
💐GraphML News🌷 Everything you wanted to know about Clifford Layers and its applications in PDE modeling and molecular dynamics is now collected on a single website, sprinkle with the recent LoGaG presentation (video) and add a little bit of Geometric Algebra intro from bivector for the best experience. Some freshly arxived papers you might want to grab for the weekend reading: Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models? by Knyazev et al - introduces Graph HyperNetwork v3 for predicting the weights of neural network architectures. The previous version GHN-2 got a massive recognition at NeurIPS’21 including an interview with Yannic Kilcher. Instead of training neural nets, you could use GHN to estimate model params in one forward pass and it demonstrated a non-trivial performance on ImageNet. In the new version, the authors apply a Graphormer on the model’s computation graph DAG to frame the task as node regression where node parameters correspond to weight matrices in the target neural nets. You can also use GHN for better initialization of model weights instead of random init. SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning by Yin et al - the next iteration of SUREL for link prediction where subgraphs are replaced with random walks for better scalability