Contenu du post
Halloween Paper Reading🎃 We hope you managed to procure enough candies and carve spooky faces on a bunch of pumpkins those days so now you can relax and read a few papers (not that spooky). Molecular dynamics is one of the booming Geometric DL areas where equivariant models show the best qualities. The two cool recent papers on that topic: ⚛️Forces are not Enough: Benchmark and Critical Evaluation for Machine Learning Force Fields with Molecular Simulations by Fu et al. introduces a new benchmark for molecular dynamics - in addition to MD17, the authors add datasets on modeling liquids (Water), peptides (Alanine dipeptide), and solid-state materials (LiPS). More importantly, apart from Energy as the main metric, the authors consider a wide range of physical properties like Stability, Diffusivity, and Radial Distribution Functions. Most SOTA molecular dynamics models were probed including SchNet, ForceNet, DimeNet, GemNet (-T and -dT), NequIP. Density Functional Theory (DFT) calculations are one of the main workhorses of molecular dynamics (and account for a great deal of computing time in big clusters). DFT is O(n^3) to the input size though, so can ML help here? Learned Force Fields Are Ready For Ground State Catalyst Discovery by Schaarschmidt et al. present the experimental study of models of learned potentials - turns out GNNs can do a very good job in O(n) time! Easy Potentials (trained on Open Catalyst data) turns out to be quite a good predictor especially when paired with a subsequent postprocessing step. Model-wise, it is an MPNN with the NoisyNodes self-supervised objective that we covered a few weeks ago. 🪐 For astrophysics aficionados: Mangrove: Learning Galaxy Properties from Merger Trees by Jespersen et al. apply GraphSAGE to merger trees of dark matter to predict a variety of galactic properties like stellar mass, cold gas mass, star formation rate, and even black hole mass. The paper is heavy on the terminology of astrophysics but pretty easy in terms of GNN parameterization and training. Mangrove works 4-9 orders of magnitude faster than standard models (that is, 10 000 - 1 000 000 000 times faster). Experimental charts are pieces of art that you can hang on a wall. 🤖Compositional Semantic Parsing with Large Language Models by Drozdov, Schärli et al. pretty much solve the compositional semantic parsing task (natural language query - structured query like SPARQL) using only code-davinci-002 language model from OpenAI (which is InstructGPT fine-tuned on code). No need for hefty tailored semantic parsing models - turns out a smart extension of the Chain-of-thought prompting (aka "let's think step by step") devised as Least-to-Most prompting (where we first answer easy subproblems before generating a full query) yields whopping 95% accuracy even on hardest Compositional Freebase Questions (CFQ) dataset. CFQ was introduced at ICLR 2020, and just after two years LMs cracked this task - looks like it's time for the new, even more complex dataset.