Contenu du post
Xmas Papers: Molecule Editing from Text, Protein Generation It's the holiday season 🎄, and what better way to spend it than reading some new papers on molecule and protein generation! Here are a few cool papers published on arxiv this week: Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing by Shengchao Liu and the Mila/NVIDIA team proposes MoleculeSTM, a CLIP-like text-to-molecule model. MoleculeSTM can do 2 impressive things: (1) retrieve molecules by text description like “triazole derivatives” and retrieve text description from a given molecule in SMILES, (2) molecule editing from text prompts like “make the molecule soluble in water with low permeability” - and the model edits the molecular graph according to the description, mindblowing 🤯 Protein Sequence and Structure Co-Design with Equivariant Translation by Chence Shi and the Mila team propose ProtSEED, a generative model for protein sequence and structure simultaneously (for example, most existing diffusion models for proteins can do only one of those at a time). ProtSEED can be conditioned on residue features or pairs of residues. Model-wise, it is an equivariant iterative model (AlphaFold 2 vibes) with improved triangular attention. ProtSEED was evaluated on Antibody CDR co-design, Protein sequence-structure co-design, and Fixed backbone sequence design. And 2 more papers from the ESM team, Meta AI, and BakerLab (check the Twitter thread by Alex Rives for more details)! Language models generalize beyond natural proteins by Robert Verkuil et al. find that ESM2 can generate de novo protein sequences that can actually be synthesized in the lab and, more importantly, do not have any match among known natural proteins. Great result knowing that ESM2 was only trained on sequences! A high-level programming language for generative protein design by Brian Hie et al. propose pretty much a new programming language for protein designers (think of it as a query language for ESMFold) - production rules organized in a syntax tree with constraint functions. Then, each program is “compiled” into an energy function that governs the generative process.