Contenu du post
GraphML News (July 4th 🦅) - Chai-2, SAIR dataset, UMA 1.1, Why flow matching generalizes Some quick news before the BBQ time and beating aliens over NYC. 🧬 Chai Discovery announced Chai-2 that excels at antibody design generating novel ones for 50+ protein targets achieving 16% binding rate in wet lab tests (that’s quite a lot). The tech report says the backbone is a modified Chai-1 but probably with lot more new training data (which is a good sign, it’s 2025 and models don’t matter, data does). Chai-2 is announced just 2 weeks after Boltz-2 - both started as AlphaFold3 reproductions but now moving in slightly different directions, eg, Chai-2 is not open-source anymore. We’ll be keeping an eye on their successes. 🧬🧬 SandboxAQ released a new SAIR dataset (structurally augmented IC50 repository) comprised of 5M structures over 1M+ unique protein-ligand systems (folded with Boltz-1x). It’s just 2.5 TB so you don’t have an excuse of not training the next protein-ligand generative model on SAIR 😉 ⚛️ FAIR Chemistry updated their Universal Model for Atoms (UMA) to 1.1 (preprint) and significantly improved the performance on catalysis and molecules tasks - scaling MoE transformers shows benefits and makes adepts of equivariance unhappy 🙂 Weekend reading: On the Closed-Form of Flow Matching: Generalization Does Not Arise from Target Stochasticity by Quentin Bertrand et al - a nice study why flow matching generalizes and can generate data outside training distribution. Turns out it happens thanks to neural nets failing to learn the velocity field exactly.