Developing Communication-Avoiding Schemes for Exascale Spectral Element Codes
We investigated a communication-bound solver at the core of many high-performance applications, namely the Conjugate Gradient (CG) method. To reduce the communication, we determined the lower bounds on the vertical data movement in CG. Using our theoretical analysis, we applied our CG solver to a high-performance discretization used in practice, the spectral element method (SEM). Guided by our analysis, we show that for the Poisson equation on modern GPUs we can improve the performance by 30% by both rematerializing the discrete system and by reformulating the system to work on unique degrees of freedom. To investigate how horizontal communication can be reduced, we compare CG to two communication-reducing techniques, namely communication avoiding and pipelined CG. We strong scale up to 4096 CPU cores and showcase performance improvements of upwards of 70% for pipelined CG compared to standard CG when applied on SEM at scale.
Figure: Strong scaling performance using different communication avoiding schemes on supercomputers with Intel Haswell supercomputers.
Reference: Karp, M., Jansson, N., Podobas, A., Schlatter, P. and Markidis, S., 2022, June. Reducing communication in the conjugate gradient method: a case study on high-order finite elements. In Proceedings of the Platform for Advanced Scientific Computing Conference (pp. 1-11)