TGTGInsighttelegram intelligenceLIVE / telegram public index
Retour aux chaînes
Graph Machine Learning avatar

TGINSIGHT CHAT

Graph Machine Learning

@graphml

Technologies

Everything about graph theory, computer science, machine learning, etc. If you have something worth sharing with the community, reach out @gimmeblues, @chaitjo. Admins: Sergey Ivanov; Michael Galkin; Chaitanya K. Joshi

Abonnés6,750Abonnés actuels de la chaîne
Posts indexés877Nombre de posts indexés
Portée récente67,530Somme des vues récentes
Posts récents

Posts récents

Page 7 sur 74 · 877 posts

Publié 30 mars

GraphML News (March 30th) - AlphaFold course, Upcoming Summer Schools The first week of ICML rebuttals has passed, one week to go - good luck everyone 💪 EMBL-EBI together with Google DeepMind released a free entry-level course about the basics of protein folding and using AlphaFold for structure prediction. The course helps to understand inputs and outputs of AlphaFold, how to interpret the metrics and predictions, and a bit of more advanced usage. A handful of summer schools covering lots of Graph and Geometric DL were announced recently: - Eastern European ML Summer School | 15-20 July 2024, Novi Sad, Serbia - ELLIS Summer School on Machine Learning for Healthcare and Biology | 11-13 June 2024, Manchester, UK - Generative Modeling Summer School | 24-28th June 2024, Eindhoven, Netherlands - The workshop on mining and learning with graphs (MLG) will be co-located with ECML PKDD in Vilnius, Lithuania in September 2024 featuring keynotes by Yllka Velaj and Haggai Maron. Weekend reading: A new version of the Hitchhiker’s guide on Geometric GNNs featuring frame-based invariant GNNs and unconstrained GNNs (btw, the paper will be presented at the next LoGaG reading group on Monday, April 1st) Space Group Informed Transformer for Crystalline Materials Generation - autoregressive, transformer-based crystal generation that takes into account space groups and Wyckoff positions (a competing diffusion model DiffCSP++ was accepted at ICLR’24) Graphs Generalization under Distribution Shifts by Tian et al Addressing heterophily in node classification with graph echo state networks by Alessio Micheli and Domenico Tortorella — applies a reservoir computing approach, that is, randomly initialize GNN weights to obtain a desired Lipschitz constant

5,100 views

Publié 23 mars

GraphML News (March 23rd) - Profluent round, Biology 2.0, TacticAI 💸 Profluent, a Berkley biotech startup founded in 2022, raises $35M (overall $44M so far). The company focuses on protein generation models in the context of CRISPR gene editing. VC funding in the biotech industry is on fire in 2024! 🧬 A huge blogpost The Road to Biology 2.0 Will Pass Through Black-Box Data by Michael Bronstein and Luca Naef offers a new perspective on the area of ML for biology and its common problem of lacking large amounts of labeled data. The idea is to leverage low-cost high-throughput data (eg, obtained from experimental facilities), coined as “black-box data”, that might not be directly understandable by humans (or experts) but can be used for training large-scale ML models even in the self-supervised regime. It is then hypothesized that the competitive edge would belong to the companies that manage to build such data pipelines and models. Time to convince old-school chemists about the benefits of black-box data. ⚽ Google DeepMind officially introduced TacticAI with the publication in Nature Communication (we wrote about it in the End-Of-The-Year post a few months ago at the preprint stage). TacticAI uses group-equivariant convnets and is designed for football games to give tactical insights for many practical cases such as corner kicks. Interestingly, experts prefer TacticAI outputs 90% of the time. Equivariance + ⚽ = 📈 Weekend reading: Atomically accurate de novo design of single-domain antibodies from the Baker Lab - RFDiffusion for antibodies Weisfeiler and Leman Go Loopy: A New Hierarchy for Graph Representational Learning by Raffaele Paolino, Sohir Maskey, Pascal Welke, and Gitta Kutyniok - WL visited one more location ✅

5,420 views

Publié 16 mars

GraphML News (March 16th) - RelationRx round, Caduceus, Blogposts, WholeGraph 💸 Relation Therapeutics, the drug discovery company, raises $35M seed funding led by DCVC and NVentures (VC arm of NVIDIA) - making it $60M in total after factoring in the previous round in 2022. Relation is developing treatments for osteoporosis and other bone-related diseases. ⚕️The race between Mamba and Hyena-like architectures for long-context DNA modeling is heating up: Caduceus by Yair Schiff featuring Tri Dao and Albert Gu is the first bi-directional Mamba equivariant to the reverse complement (RC) symmetry of DNA. Similarly to the recent Evo, it supports sequence lengths up to 131k. In turn, a new blog post by Hazy Research on Evo hinted upon the new Mechanistic Architecture Design framework that employs synthetic probes to check long-range modeling capabilities. 💬 A new Medium blogpost by Xiaoxin He (NUS Singapore) on chatting with your graph - dedicated to the recent G-Retriever paper on graph-based RAG for question answering tasks. The post goes through the technical details (perhaps the most interesting part is prize-collecting Steiner Tree for subgraph retrieval) and positions the work in the flurry of recent Graph + LLM approaches including Talk Like a Graph (highlighted in the recent Google Research blogpost) and Let the Graph do the Talking. Fun fact: now we have 2 different datasets named GraphQA with completely different contents and tasks (one from G-Retriever, another one from the Google papers). 💽 The WholeGraph Storage by NVIDIA for PyG and DGL - a handy way for distributed setups to keep a single graph in the shared storage accessible by the workers. WholeGraph comes in three flavors: continuous, chunked, and distributed. Weekend reading: Personalized Audiobook Recommendations at Spotify Through Graph Neural Networks by Marco De Nadai, Francesco Fabbri, and the Spotify team - Heterogeneous GNNs + The Two (MLP) Towers for SOTA RecSys. Universal Representation of Permutation-Invariant Functions on Vectors and Tensors by Puoya Tabaghi and Yusu Wang (UCSD) - when encoding sets of N elements of D-dimensional vectors, DeepSets require a latent dimension of N^D. This cool work reduces this bound to 2ND 👀. Generalizing Denoising to Non-Equilibrium Structures Improves Equivariant Force Fields by Yi-Lun Liao, Tess Smidt, Abhishek Das - the success of a Noisy Nodes-like auxiliary denoising objective is extended to non-equilibrium structures thanks to encoding forces of non-equilibrium structures. Yields SOTA on OpenCatalyst (if you have 16-128 V100’s though).

5,590 views

Publié 9 mars

GraphML News (March 10th) - Protein Design Community Principles, RF All Atom weights, ICLR workshops 🤝 More than 100 prominent researchers in protein design, structural biology, and geometric deep learning committed to the principles of Responsible AI in Biodesign. Recognizing the increasing capabilities of deep learning models in designing functional biological molecules, the community came up with several core values and principles such as the benefit of society, safety and security, openness, equity, international collaboration, and responsibility. Particular commitments include more scrutiny towards hazardous biomolecules before their manufacturing, better evaluation and risk assessment of DL models. Good for the protein design community, let’s hope those would be practically implemented! 🧬 Committing to the newly introduced principles, Baker’s lab released RosettaFold All-Atom and RFDiffusion All-Atom together with their model weights and several inference examples. Folks on Twitter who interpret the principles as “closed-source AI taking over” are obviously wrong 😛 📚 ICLR 2024 workshops started posting accepted papers - so far we see the papers from AI 4 Differential Equations, Representational Alignment, and Time Series for Health. ICLR workshop papers are usually good proxies for ICML and NeurIPS submissions, so you might be interested to check those of your domain. Weekend reading: A Survey of Graph Neural Networks in Real world: Imbalance, Noise, Privacy and OOD Challenges by Wei Ju et al Graph neural network outputs are almost surely asymptotically constant by Sam Adam-Day et al. feat. Ismail Ilkan Ceylan Pairwise Alignment Improves Graph Domain Adaptation by Shikun Liu et al feat. Pan Li Understanding Biology in the Age of Artificial Intelligence by Elsa Lawrence, Adham El-Shazly, Srijit Seal feat. our own Chaitanya K. Joshi

5,480 views

Publié 2 mars

GraphML News (March 2nd) - Categorical Deep Learning, Evo, and NeuralPlexer 2 🔀 A fresh look on deep learning from the category theory perspective: Categorical Deep Learning: An Algebraic Theory of Architectures by Bruno Gavranović, Paul Lessard, Andrew Dudzik, featuing Petar Veličković. The position paper attempts to generalize Geometric Deep Learning even further - by the means of monad algebras that generalize invariance, equivariance, and symmetries (🍞 and 🧈 of GDL). The main part quickly ramps up to some advanced category theory concepts but the appendix covers the basics (still recommend Cats4AI as a pre-requisite though). 🧬 Evo - a foundation model by Arc Institute for RNA/DNA/protein sequences based on the StripedHyena architecture (state space models and convolutions) with the context length of 131K tokens. Some applications include zero-shot function prediction for ncRNA and regulatory DNA, CRISPR system generation, generating whole genome sequences, and many more. Adepts of the church of scaling laws might be interested in promising scaling capabilities of Evo that seems to outperform Transformers and recent Mamba 🪢 NeuralPlexer 2, a generative model for protein-ligand docking from Iambic, Caltech, and NVIDIA, challenges Alphafold-latest in several benchmarks: 75.4% RMSD <2Å on PoseBusters vs 73.6 of Alphafold-latest without site specification, and up to 93.8% with site specification, while being about 50x faster than AlphaFold. The race in comp bio intensifies, moats are challenged, and for us it means we’ll see more cool results - at the cost of more proprietary models and closed data though. Weekend reading: Graph Learning under Distribution Shifts: A Comprehensive Survey on Domain Adaptation, Out-of-distribution, and Continual Learning by Man Wu et al. TorchMD-Net 2.0: Fast Neural Network Potentials for Molecular Simulations by Raul P. Pelaez, Guillem Simeon, et al - the next version of the popular ML potential package, now up to 10x faster thanks to torch compile! (from that perspective, a switch to JAX seems inevitable) Weisfeiler-Leman at the margin: When more expressivity matters by Billy Franks, Chris Morris, Ameya Velingker, and Floris Geerts - a new study on expressivity and generalization of MPNNs that continues WL meet VC

5,730 views

Publié 29 févr.

​​Learning on Graphs @ NYC meetup (Feb 29th - March 1st) online streaming The 2-day LoG meetup taking place in Jersey City will be streamed online openly for everyone! The talks include the Google Research team (who will for sure talk like a graph), Ricky Chen and Brandon Amos from Meta AI, biotech presence with Matthew McPartlon, Luca Naef from VantAI and Samuel Stanton from Genentech, and many more (see the schedule attached).

7,930 views

Publié 24 févr.

GraphML News (Feb 24th) - Orbital Materials Round, GNNs at LinkedIn, MLX-graphs ⚛️ Orbital Materials (founded by ex-DeepMind researchers) raised $16M Series A led by Radical Ventures and Toyota Ventures. OM focuses on materials science and shed some light on LINUS - the in-house 3D foundation model for material design (apparently, an ML potential and a generative model) with the ambition to become the AlphaFold of materials science. GNNs = 💸 🏋️‍♀️ LinkedIn published some details of their GNN architecture and GNN-powered services in the KDD’24 paper LiGNN: Graph Neural Networks at LinkedIn. The main graph is heterogeneous, multi-relational, and contains about 100B nodes and few hundred billion edges (rather sparse). The core GNN model is GraphSAGE is trained on linked prediction with various tweaks like temporal neighborhood sampling (from latest to older), PPR-based node sampling, and node ID embeddings. A few engineering tricks like multi-processing shared memory and smart node grouping allowed to speed up training from 24h down to 3 hours. LiGNN boosts recommendations and ads CTR. The bottom line: GNNs = 💸 🍏 Apple presented MLX-graphs: the GNN library for the MLX framework specifically optimized for Apple Silicon. Since the CPU/GPU memory is shared on M1/M2/M3, you don’t have to worry about moving tensors around and at the same time you can enjoy massive GPU memory of latest M2/M3 chips (64 GB MBPs and MacMinis are still much cheaper than A100 80 GB). For starters, MLX-graphs includes GCN, GAT, GIN, GraphSAGE, and MPNN models and a few standard datasets. 🧬 The OpenFold consortium announced SoloSeq and OpenFold-Multimer, open source and open weights analogues of ESMFold and AlphaFold-Multimer, respectively. The OpenFold repo already showed some signs of new modules, and now there is a public release. 👨‍🏫 Steven L Brunton (U Washington) released a new lecture video series on Physics Informed ML covering AI 4 Science applications enabled by (mostly geometric) deep learning that respect physical symmetries and invariances of the modeled system. This includes, for example, modeling fluid dynamics, PDEs, turbulence, and optimal control. A nice entrypoint into scientific applications! Weekend reading: Proteus: pioneering protein structure generation for enhanced designability and efficiency by Chentong Wang feat. Longxing Cao from Westlake - finally, a new protein generation model that seems to beat RFDiffusion and Chroma! Universal Physics Transformers by Benedikt Alkin feat Johannes Brandstetter Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation by Lingxiao Zhao, Xueying Ding, and Leman Akoglu (all CMU)

5,410 views

Publié 17 févr.

GraphML News (Feb 17th) - PyG 2.5, VantAI deal, Discrete Flow Matching, Position papers Sora and Gemini 1.5 took all the ML news feeds this week - let’s check what is there in graph learning beyond the main wave of AI anxiety and stress for grad students. 🔥 A fresh release PyG 2.5 features a new distributed training framework (co-authored by Intel engineers), RecSys support with easy retrieval techniques like MIPS over node embeddings, new Edge Index representation instead of sparse tensors, and rewritten Message Passing class for torch.compile. Lots of new cool stuff! 📚 Xavier Bresson (NUS Singapore) started publishing the slides and notebooks of his most recent 22/23 GraphML course - highly recommended to check it out. Hopefully, this initiative would encourage folks running Graph & Geometric DL courses at Oxbrige to publish their lectures as well 😉 💸 The $674M (in biobucks) deal was announced between VantAI and Bristol Myers Squibb for developing molecular glues. Besides publishing on generative models, VantAI runs open seminars on GenAI for drug discovery (the most recent talk on FoldFlow is already on YouTube). 📐 Two papers from the MIT team of Regina Barzilay and Tommi Jaakkola introduce flow matching for discrete variables (like atom types or DNA base pairs): Dirichlet Flow Matching with Applications to DNA Sequence Design by Hannes Stärk, Bowen Jing, feat. Gabriele Corso - by defining flows on a simplex where the prior is a uniform Dirichlet distribution. Also supports classifier-free guidance and Consistency models-like distillation to perform generation in one forward pass. Generative Flows on Discrete State-Spaces: Enabling Multimodal Flows with Applications to Protein Co-Design by Andrew Campbell, Jason Yim, et al - by using Continuous Time Markov Chains (CTMC) where the prior distribution is either a uniform or all-mask absorbed state (similar to discrete diffusion models). The resulting Multiflow model now has all necessary components of protein backbone generation implemented as flow matching (translation and rotation as continuous FM, and amino acids as discrete FM). Position papers for the weekend reading: Future Directions in Foundations of Graph Machine Learning by Chris Morris feat. Haggai Maron, Michael Bronstein, Stefanie Jegelka and others - on expressive power, generalization, and optimization of GNNs. Position Paper: Challenges and Opportunities in Topological Deep Learning by Theodore Papamarkou feat. Bastian Rieck, Michael Schaub, Petar Veličković and a huge authors team - on theoretical and practical challenges of TDL. Graph Foundation Models by Haitao Mao feat. Neil Shah, Michael Galkin, and Jilian Tang - finally, a non-LLM discussion on designing foundation models on graphs and for all kinds of graph tasks. The authors hypothesize what could be the transferable and invariant graph vocabulary given heterogeneity of graph structures and their features spaces, and how Graph FMs might benefit from scaling laws (namely, what should be scaled and where it doesn’t bring benefits)

5,280 views

Publié 12 févr.

The LoG meetup in New Jersey The LoG meetup in the NYC area will happen on Feb 29th-March 1st at New Jersey Institute of Technology with invited speakers including Bryan Perozzi and Anton Tsitsulin (both Google Research), Ricky Chen (Meta AI), Jie Gao (Rutgers), and many others. Come to NJIT@JerseyCity to learn from and connect with the local graph learning community! Register here, check the Twitter announcement

5,120 views

Publié 10 févr.

GraphML News (Feb 10th) - TensorFlow GNN 1.0, New ICML submissions 🔧 The official release of TensforFlow-GNN 1.0 by Google (after several road show presentations from the team at ICML and NeurIPS) - production-level library for training GNNs on large graphs with the first-class citizen support for heterogeneous graphs. Check the blog post and github repo for more practical examples and documentation ⚛️ The Denoising force fields repository from Microsoft Research for diffusion models trained on coarse-grained protein dynamics data - you can use it for standard density modeling or extract force fields from coarse-grained structures to use in Langevin dynamics simulations. The repo contains several pre-trained models you can play around with. The ICML deadline has passed and we saw a flurry of cool new preprints submitted to arxiv this week. Some notable mentions: 🐍Graph-Mamba: Towards Long-Range Graph Sequence Modeling with Selective State Spaces by Chloe Wang et al: state space models like Mamba are all the rage those days in NLP and CV (although so far attention still rules), this is a nice adaptation of SSMs to graphs, tested on the LRGB! 🗣️Let Your Graph Do the Talking: Encoding Structured Data for LLMs by Bryan Perozzi feat. Anton Tsitsulin present GraphToken (extension of Talk Like a Graph, ICLR 2024): using trainable set- or graph encoders to get soft prompt tokens improves the performance of frozen LLMs in answering natural language questions about basic graph properties. The last resort of hardcore graph mining teams jumps into LLMs 🗿 ⏩Link Prediction with Relational Hypergraphs by Xingyue Huang feat. Pablo Barcelo, Michael Bronstein, and Ismail Ceylan: extends conditional message passing models like NBFNet to relational hypergraphs (dubbed HC-MPNN) with nice theoretical guarantees and impressive inductive performance boosts. 📈Neural Scaling Laws on Graphs by Jingzhe Liu feat. Neil Shah and Jilian Tang: one of the first systematic studies of scaling laws for graph models (GNNs and Graph Transformers) and data (mostly OGB datasets) where the number of edges is selected as the universal size metric. Basically, scaling does happen but with certain nuances as to model depth and architecture (transformers seem to scale more monotonically). The church of scaling laws opens its doors to the graph learning crowd ⛪ 📚On the Completeness of Invariant Geometric Deep Learning Models by Zian Li feat. Muhan Zhang: theoretical study of DimeNet, GemNet, and SphereNet with the proofs of their E(3)-completeness through the nested GNN extension (Nested GNNs from NeurIPS’21) 📚On dimensionality of feature vectors in MPNNs by Cesar Bravo et al - turns out the WL-MPNN equivalence holds even for 1-dimensional node features when using non-polynomial activations like sigmoid. Next time, we’ll look into some new position papers.

5,060 views

Publié 3 févr.

GraphML News (Feb 3rd) - DGL 2.0⚡ All ICML deadlines have passed - congratulations to all who made it through the sleepless nights over the last week! We will start seeing some fresh submissions relatively soon on social media (among 10k submitted papers and ~220 position papers) Meanwhile, DGL 2.0 was released featuring GraphBolt - a new tool for streaming data loading and sampling offering around 30% speedups in node classification and up to 400% in link prediction 🚀 Besides that, the new version includes utilities for building graph transformers and a handful of new datasets - LRGB and a recent suite of heterophilic datasets The AppliedML Days @ EPFL will take place on March 25 and 26th - the call for the AI and Molecular world track is still open Weekend reading: Combinatorial prediction of therapeutic perturbations using causally-inspired neural networks by Guadalupe Gonzalez feat Michael Bronstein and Marinka Zitnik - introduces PDGrapher, a causally-inspired GNN model to predict therapeutically useful perturbagens VC dimension of Graph Neural Networks with Pfaffian activation functions by D’Inverno et al - extension of the WL meets VC paper to new non-linearities like sigmoid and hyperbolic tangent NetInfoF Framework: Measuring and Exploiting Network Usable Information (still anon by accepted to ICLR’24) - introduces the “network usable information” and a fingerpring-like approach to quantity the gains brought by a GNN model compared to a non-GNN baselne.

5,670 views

Publié 27 janv.

GraphML News (Jan 27th) - New Blogs, LigandMPNN is available Seems like everyone is grinding for the ICML’24 deadline next week so there isn’t much news those days. A few highlights: Dimension Research published 2/3 parts of their ML x Bio review of NeurIPS’23: on Generative Protein Design, and on Generative Molecular Design, the last one is going to be about drug target interaction prediction. The blog post on Exphormer by Ameya Velingker and Balaji Venkatachalam from Google Research on the neat ICML’23 sparse graph transformer architecture that scales to graphs much larger than molecules. Glad to see GraphGPS and Long Range Graph Benchmark mentioned a few times 🙂 LigandMPNN was released on GitHub this week after appearing as a module in several recent protein generation papers. LigandMPNN significantly improves over ProteinMPNN in modeling non-protein components like small molecules, metals, and nucleotides. Weekend reading: Equivariant Graph Neural Operator for Modeling 3D Dynamics by Minkai Xu, Jiaqi Han feat Jure Leskovec and Stefano Ermon: equivariant GNNs 🤝 neural operators, also provides a nice condensed intro to the topic Towards Principled Graph Transformers by Luis Müller and Christopher Morris - study of the Edge Transformer with triangular attention applied to graph tasks. Edge Transformer has shown remarkable systematic generalization capabilities and it’s intriguing to see how it works on graphs (O(N^3) complexity for now though). Tweets to Citations: Unveiling the Impact of Social Media Influencers on AI Research Visibility - turns out that papers shared on X / Twitter by AK and Aran Komatsuzaki have significantly more citations. Time to revive your old sci-Twitter account

5,740 views
12•••5678910•••15•••20•••25•••30•••35•••40•••45•••50•••55•••60•••65•••70•••7374