Active Learning and Enhanced Sampling for Developing Interatomic Neural Network Potentials

Viktor Zaverkin

The impact of data-driven interatomic potential models on computational chemistry and materials science is tremendous. They extend the accessible time and length scales when modeling and predicting physical and chemical phenomena with first-principles accuracy. Generating a sufficiently expressive training set is the most time-consuming task when developing accurate and transferrable potential models. The contributions of this talk are many-fold: First, this work presents active learning methods that adaptively select batches of atomic structures on which labels, i.e., energies and forces, are calculated for inclusion in the training set. These methods use gradient features specific to atomistic neural networks to evaluate the model's uncertainty on queried samples. Second, this work extends the previously defined AL methods to an automatized framework for generating expressive training sets on the fly. This framework combines a physically motivated sampler (e.g., molecular dynamics), a differentiable model's uncertainty, and a batch selection method. Finally, we benchmark our methods for generating training sets for various material systems and demonstrate their excellent accuracy and data efficiency.

To the top of the page