Contenu du post
GraphML News (July 7th) - Generative Chemistry, Temporal Graph Benchmark Lots of news this week! 🔬 Starting with new blog posts, Charlie Harris wrote an article on Diffusion Models in Generative Chemistry for Drug Design covering the basics of denoising diffusion and score-based generative modeling going into molecular usecases with Equivariant Diffusion, DiffSBDD, DiffDock, and raising questions about fair evaluation of generative models vs standard tools. Looking more from the industrial perspective, Leo Wossnig published a piece Where is generative design in drug discovery today discussing successes and failures of generative approaches and highlighting the main obstacles for ML folks wishing to dive into drug discovery: (1) data scarcity (eg, no data for new targets), (2) slow experimental pipelines to generate new data, (3) end-to-end pipelines and tech stack in general. 🔧Graphium is the new library for molecular representation learning in the Datamol ecosystem. Graphium is packed with latest algorithms (like Random Walk Structural Encodings) and ML models (like recent GPS++, the winner of OGB LSC’22), and scales to large compute, you can even spin up training on Graphcore IPUs. 📏Temporal Graph Benchmark (TGB) is finally here! It was long awaited in the graph learning community that OGB needs a temporal branch, and TGB delivers dynamic link prediction and node property prediction datasets, standard loaders, evaluators, and, of course, the leaderboards (good old Temporal Graph Network is still a very strong baseline). Similarly to OGB, there are small, medium, and large graphs (the largest include about 1M nodes, 50M edges, 30M timesteps). More details can be found in the preprint by Shenyang Huang, Farimah Poursafaei, and the OGB gang. AStarNet, the scalable GNN for KG reasoning, can now be integrated right with ChatGPT to enhance its factual correctness. Given a textual query, AStarNet also runs graph inference on the backbone graph (Wikidata subset) and produces top reasoning paths supporting the answer. New foundation models: 1️⃣NSQL, a Copilot-like LM for SQL queries by Numbers Station, is openly available in 350M / 2B / 6B versions outperforming all existing open-source SQL models; 2️⃣xTrimo, 100B closed-source protein LM