GROMACS

Molecular simulation has evolved into a standard technique employed in virtually ​all ​high-impact publications e.g. on new protein structures. The main bottleneck for scaling in GROMACS is the 3D-FFT used in the particle-mesh Ewald electrostatics (PME). Since PME is very fast, and used by MD codes world wide, it is worth investigating if the communication overhead can be lowered. This is done in collaboration with PDSE (see the 3D-FFT sub-project), as well as a co-design effort with Nvidia for parallelizing the 3D-FFT over GPUs. For extreme scaling, we will also investigate the fast multipole method (FMM) since it has better scaling complexity. A problem was always energy conservation, which is now solved in collaboration with the numerical analysis community, and we will integrate the ExaFMM code of Rio Yokota (Tokyo Tech) into GROMACS.

We also need to rethink the MPI communication setup to improve strong scaling, including different communication patterns, non-blocking collectives and persistent communication (when integrated in MPI), and given modern hardware developments we will spend efforts on improving performance on very “fat” nodes that e.g. have multiple accelerators and high-end CPUs/networking with task-based parallelism. We are investigating and integrating new tasking frameworks, such as CUDA graphs in collaboration with Nvidia. Finally, we will work on ensemble-level parallelism where e.g. Markov State Models and enhanced sampling are used to loosely couple simulations to sample complex dynamics. Here performant APIs are needed so users can compose their own massively parallel ensemble calculations.