Molecular: Algorithms for molecular dynamics on heterogeneous architectures

With molecular dynamics molecular processes can be followed in atomistic detail, something which is difficult or impossible to achieve with experimental techniques. But obtaining this information comes at a high computational cost. The equations of motion of hundreds of thousands of atoms need to be integrated for billions of time steps, which can mean months of simulation time. Gains in computational efficiency are therefore highly beneficial for molecular simulation community, not only within SeRC, but world wide.

As of a few years CPU cores are no longer getting (much) faster, but rather there is a continuous increase in the number of cores per CPU. Also specific accelerators, such as graphical processes units, are being used for computing. This trend towards more heterogeneous architectures requires a different approach to algorithms used for computation. Especially in molecular dynamics, where an integration step now take less than a millisecond, this means that algorithms, and parallelization in particular, need to be rethought and redesigned. In this project new algorithms are developed and implemented in the molecular simulation package GROMACS.

In 2013 we released the first version our hybrid acceleration and hybrid parallelization algoritms to the whole community through the 4.6 release of GROMACS package. The hybrid acceleration uses a new algorithm for non-bonded pair-forces, which can be used, in slightly adapted versions, on CPUs using SIMD (AVX-256/512) intrinsics and on GPUs using CUDA. This has brought significant performance improvement. The hybrid parallelization, based on MPI and OpenMP, allows scaling to more cores. The scaling limit is now down from around 400 atoms per core to less than 100 per core. This allows for significantly longer simulation times. In addition, adding GPUs provide a factor 3 speed-up, which allows for more cost and power efficient computing.

Currently we are extended the GPU acceleration to the particle-mesh Ewald electrostatics method, which is needed to fully utilize modern hardware where GPUs continu to outpace CPUs. The GROMACS 2017 release with have a single GPU implementation of PME and we are working on several parallel options.

The code, including these improvements, is available from the GROMACS website. GROMACS is used by thousands of users world-wide.