Contenu du post
GraphML News (Aug 30th) - OpenAI enters bio, AtomWorks, OrbMol, NeurIPS workshops 📈 The church of scale enters comp bio: OpenAI published first results on protein design of Yamanaka factors (linked to cell aging) together with Retro Bio (where sama happens to be one of investors). The backbone is gpt-4b micro initialized from an existing 4o checkpoint and enriched with “tokenized 3D structure data” (remember ESM-3?) fine-tuned on a specialized dataset. Experimental results are claimed to be quite solid: hit rates of 30-50% (typically it’s less than 10%) with a bunch of other biochemistry markers. The argument between scalable non-equivariant models vs bespoke geometric models got a new data point: will raw compute of OpenAI + vanilla transformers conquer the biotech world too? We’ll keep you posted. 🧬 BakerLab released RosettaFold 3 and AtomWorks, a data processing framework used to train it. While you’d certainly see general remarks about comparisons with AF3 and Boltz, I’d highlight that comp bio folks start to recognize the value of data as much as the model itself (what frontier labs recognized quite some time ago). Real engineering will start when they’d need to serve those protein design models to a few billion clients 😉 ⚛️ Orbital Materials released OrbMol, a version of Orb-v3 for molecules (the others are for crystals) trained on OpenMolecules 2025. Orb is still an MPNN which makes it quite fast and useful for MD computations. By the way, also check out NeurIPS 2025 workshops — finally more diverse than just LLMs and reasoning — and features a handful of graph learning venues. Weekend reading: Turning Tabular Foundation Models into Graph Foundation Models from Yandex Research - another interesting approach to GFMs via TabPFNv2 over original node features + mined structural features