Contenu du post
GraphML News (July 20th) - Pinder and Plinder, LAB bench, ICML 2024 🎙️ ICML 2024 starts next week - enjoy the conference and Vienna if you are participating this year! Beside the main program, Monday will feature the Graph learning tutorial, Thursday and Friday have a handful of graph-related workshops. 🧬 VantAI together with MIT, NVIDIA, UniBasel, and SIB introduce two novel large-scale benchmarks: Pinder (Protein INteraction Dataset and Evaluation Resource) and Plinder (Protein-Ligand Interaction Dataset and Evaluation Resource). Pinder includes 500x more data than PPIRef, and Plinder is roughly 10x larger than DockGen, previous largest datasets in the area susceptible to test set leakages. Re-training SOTA diffusion models on Pinder and Plinder shows much lower results indicating that saturation is far away (at least for the coming year). Besides, it is great to see the industrial company (from a highly competitive CompBio area) contributing to the field with open datasets. Pinder and Plinder will be the main datasets for the upcoming ML for Structural Bio challenge at NeurIPS 2024, so prepare your GPUs and diffusion models. 🔬 FutureHouse released the LAB bench for studying LLMs in Biology and Chemistry. The benchmark includes 8 categories where LLMs have to deal with figures, images, scientific literature, databases, and designing protocols. Recent LLMs and VLMs (GPT-4o, Claude, and LLama-3) all show rather underwhelming results on those tasks - it is finally a new unsaturated benchmark for the LLM crowd! The authors saved some data to check training contamination of future models (eg, when training data for the next gen of such models would include validation and test splits of the datasets). Weekend reading: Beyond Euclid: An Illustrated Guide to Modern Machine Learning with Geometric, Topological, and Algebraic Structures by Sophia Sanborn, Johan Mathe, Mathilde Papillon, et al - a massive survey with amazing illustrations PINDER: The protein interaction dataset and evaluation resource by Daniel Kovtun, Mehmet Akdel, and VantAI folks feat. Michael Bronstein PLINDER: The protein-ligand interactions dataset and evaluation resource by Janani Durairaj, Yusuf Adeshina, and VantAI folks LAB-Bench: Measuring Capabilities of Language Models for Biology Research by Jon M. Laurent, Joseph D. Janizek, et al feat. Andrew White