TGINSIGHT CHAT
Graph Machine Learning
@graphml
TechnologiesEverything about graph theory, computer science, machine learning, etc. If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo. Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi
Posts récents
Page 14 sur 74 · 877 posts
Publié 10 janv.
Validated de novo generated antibodies & AI 4 Science talks - Absci announced their de novo zero-shot generated therapeutic antibodies were validated in the wet lab. The pre-print is scarce on technical details, but what we can infer is that they combine many new geometric generative models with fast screening pipelines. - A new series of talks on AI4Science is starting next week! The inaugural talk will be delivered by Simon Batzner (Harvard) on “E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials”
Publié 7 janv.
ICLR 2023 Workshops The list of workshops at upcoming ICLR’23 has been announced! A broad Graph ML audience might be interested in: - From Molecules to Materials: ICLR 2023 Workshop on Machine learning for materials (ML4Materials) - Machine Learning for Drug Discovery (MLDD) - Neurosymbolic Generative Models (NeSy-GeMs) - Physics for Machine Learning - Deep Learning for Code (DL4C)
Publié 5 janv.
New End of the Year BlogPosts In the first week of a new year, many researchers summarize their thoughts about the past and future. In addition to the previous post reflecting on GraphML in 2022 and 2023, a few new ones appeared: 1. AI in Drug Discovery 2022 by Pat Walters (Relay Therapeutics) on most inspiring papers in molecular and protein ML. 2. The Batch #177 includes predictions for 2023 by Yoshua Bengio (on reasoning), Alon Halevy (on personal data treatment), Douwe Kiela (on practical aspects of LLMs), Been Kim (on interpretability), and Reza Zadeh (on active learning) 3. Using Graph Learning for Personalization: How GNNs Solve Inherent Structural Issues with Recommender Systems by Dylan Sandfelder and Ivaylo Bahtchevanov (kumo.ai) - on applying GNNs in RecSys with examples from Spotify, Pinterest and UberEats. 4. Top Language AI research papers from Yi Tay (Google) - on large language models, the forefront of AI that does have an impact on Graph ML (remember protein language models like ESM-2 and ESM Fold, for instance).
Publié 1 janv.
🎄 It's 2023! In a new post, we provide an overview of what’s happened in Graph ML in 2022 and its subfields (and hypothesize for potential breakthroughs in 2023), including Generative Models, Physics, PDEs, Graph Transformerrs, Theory, KGs, Algorithmic Reasoning, Hardware, and more! Brought to you by Michael Galkin, Hongyu Ren, Zhaocheng Zhu with the help of Christopher Morris and Johannes Brandstetter https://mgalkin.medium.com/graph-ml-in-2023-the-state-of-affairs-1ba920cb9232
Publié 23 déc.
Xmas Papers: Molecule Editing from Text, Protein Generation It's the holiday season 🎄, and what better way to spend it than reading some new papers on molecule and protein generation! Here are a few cool papers published on arxiv this week: Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing by Shengchao Liu and the Mila/NVIDIA team proposes MoleculeSTM, a CLIP-like text-to-molecule model. MoleculeSTM can do 2 impressive things: (1) retrieve molecules by text description like “triazole derivatives” and retrieve text description from a given molecule in SMILES, (2) molecule editing from text prompts like “make the molecule soluble in water with low permeability” - and the model edits the molecular graph according to the description, mindblowing 🤯 Protein Sequence and Structure Co-Design with Equivariant Translation by Chence Shi and the Mila team propose ProtSEED, a generative model for protein sequence and structure simultaneously (for example, most existing diffusion models for proteins can do only one of those at a time). ProtSEED can be conditioned on residue features or pairs of residues. Model-wise, it is an equivariant iterative model (AlphaFold 2 vibes) with improved triangular attention. ProtSEED was evaluated on Antibody CDR co-design, Protein sequence-structure co-design, and Fixed backbone sequence design. And 2 more papers from the ESM team, Meta AI, and BakerLab (check the Twitter thread by Alex Rives for more details)! Language models generalize beyond natural proteins by Robert Verkuil et al. find that ESM2 can generate de novo protein sequences that can actually be synthesized in the lab and, more importantly, do not have any match among known natural proteins. Great result knowing that ESM2 was only trained on sequences! A high-level programming language for generative protein design by Brian Hie et al. propose pretty much a new programming language for protein designers (think of it as a query language for ESMFold) - production rules organized in a syntax tree with constraint functions. Then, each program is “compiled” into an energy function that governs the generative process.
Publié 16 déc.
Friday GraphML News Not much news this week - seems that the community went for a break after consecutive NeurIPS and LOG. A few things came to our attention: - IPAM organizes a workshop on Learning and Emergence in Molecular Systems at UCLA in Jan 23-27 with invited talks including Xavier Bresson, Kyunghyun Cho, Bruno Correia, Tommi Jaakkola, Frank Noe, Tess Smidt, and Max Welling - Recordings of keynotes and orals at LOG have been published on YouTube, recordings of workshops and tutorials are expected soon
Publié 12 déc.
CASP 15 - MSAs Strike Back CASP (Critical Assessment of Techniques for Protein Structure Prediction) is a bi-annual challenge on protein structure modeling. In 2020, AlphaFold 2 revolutionized the field of protein structure prediction winning the CASP 14 challenge by a huge margin using Geometric Deep Learning. This weekend the results of CASP 15 were announced - what do we see there after glancing through the abstracts? In short, multiple sequence alignments (MSAs) do no go anywhere and still are the main component of winning approaches. Most of the top models are based on AlphaFold 2 (and its Multimer version) with many tweaks here and there. Protein LM-based folding like ESM Fold (popular for not needing MSAs) seems to be far away from the top. More reflections by Ezgi Karaca and Sergey Ovchinnikov
Publié 10 déc.
LOG 2022 and News The Learning of Graphs conference started on Friday - join the talks, poster sessions, and recently announced tutorials on Saturday and Sunday! Other news of the week: DeepMind finally announced A Generalist Neural Algorithmic Learner (spotlight at LOG 2022) - an approach to train a single GNN processor network that can solve 30 diverse algorithmic reasoning tasks from the CLRS-3- benchmark. Don’t miss the 3-hour tutorial on Saturday PDEArena by Microsoft Research is the PDE surrogate benchmarking framework, 20 models, 4 datasets on fluid dynamics and electrodynamics. Time to flex your PDE solvers 💪 The OpenCatalyst team released AdsorbML: ML-based potentials deliver a whopping 1300x boost compared to DFT while retaining 85% accuracy (and 4000x retaining 75%) in identification of low energy adsorbate-surface configurations.
Publié 7 déc.
LOG 2022 In-Person Meetups LOG 2022, the premier graph conference, starts this Friday! It is going to be fully remote but the GraphML community all over the globe organizes physical local meetups you might want to join: - Cambridge meetup - Würzburg meetup of the DACH area - Boston area meetup at MIT - Montreal meetup at Mila - Paris meetup at CentraleSupélec (Paris-Saclay) Let us know if you organize a meetup in your area and we’ll update the post.
Publié 5 déc.
Weisfeiler and Leman Go Relational by Pablo Barcelo (PUC Chile & IMFD & CENIA Chile), Mikhail Galkin (Mila), Christopher Morris (RWTH Aachen), Miguel Romero Orth (Universidad Adolfo Ibáñez & CENIA Chile) arxiv Multi-relational graphs have been surprisingly neglected by the GNN theory community for quite a while. In our fresh LOG 2022 paper, we bridge this gap and propose Relational WL (RWL), an extension of the classical Weisfeiler-Leman test for multi-relational graphs (such as molecular graphs or knowledge graphs). We prove several important theorems: 1) 1-RWL is strictly more powerful than 1-WL; 2) R-GCN and CompGCN, common multi-relational GNNs, are bounded by 1-RWL; 3) R-GCN and CompGCN are in fact equally expressive Even more interesting finding is that the most expressive message functions should capture vector scaling, eg, multiplication or circular correlation. This result gives a solid foundation for GINE with multiplicative message function (one of the most popular GNN encoders for molecular graphs) and CompGCN with DistMult. Based on the theory for homogeneous higher-order GNNs, we show there exist higher-order relational networks, k-RNs, that are more expressive than 1-RWL. Similarly to local k-GNNs, there exist approximations that reduce their computational complexity. --- Now we have theoretical mechanisms to explain the expressiveness of relational GNNs! In the next post, we’ll check the other places Weisfeiler and Leman visited in 2022 and what are the results of their trips 🚠
Publié 2 déc.
Friday News: PyG 2.2 and Protein Diffusion Models For those who are at the NeurIPS workboat 2022, Saturday and Sunday are days of workshops on graph learning, structural biology, physics, and material discovery. Apart from that, The PyG team has finally released PyG 2.2.0, the first version to feature the super-optimized pyg-lib that speeds up GNNs and sampling on both CPUs and GPUs (sometimes up to 20x!). The 2.2 update also includes new FeatureStore and GraphStore with which you can set up communications with large databases and graphs too big to store in memory. Time to update your envs ⏰ Generate Biomedicines releases Chroma, an equivariant conditional diffusion model for generating proteins. The conditional part is particularly cool as we usually want to generate proteins with certain properties and functions - Chroma allows to impose functional and geometric constraints, and even use natural language queries like “Generate a protein with CHAD domain” thanks to a small GPT-Neo trained on protein captioning. The 80-pager paper is on the website, and you can have a look at the thread by Andrew Beam. Simultaneously, the Baker Lab releases RoseTTa Fold Diffusion (RF Diffusion) packed with the similar functionality also allowing for text prompts like “Generate a protein that binds to X”. Check out the Twitter thread by Joseph Watson, the 1st author. The 70-pager preprint is available, so here is your casual weekend reading of two papers 🙂
Publié 30 nov.
GPS++ (OGB LSC’22 Winner) is Available on IPUs GPS++, the model by Graphcore, Mila, and Valence Discovery that won the OGB Large-Scale Challenge 2022 in the PCQM4M v2 track (graph regression) is now publicly available on Paperspace with simple training and inference examples in Jupyter Notebooks. Actually, you can try it on powerful IPUs — custom chips and servers built by Graphcore for optimized sparse operations. Raw checkpoints are also available in the official Github repo.