Abstract
A novel implementation of the self-consistent field (SCF) procedure specifically designed for high-performance execution on multiple graphics processing units (GPUs) is presented. The algorithm offloads to GPUs the three major computational stages of the SCF, namely, the calculation of one-electron integrals, the calculation and digestion of electron repulsion integrals, and the diagonalization of the Fock matrix, including SCF acceleration via DIIS. Performance results for a variety of test molecules and basis sets show remarkable speedups with respect to the state-of-the-art parallel GAMESS CPU code and relative to other widely used GPU codes for both single and multi-GPU execution. The new code outperforms all existing multi-GPU implementations when using eight V100 GPUs, with speedups relative to Terachem ranging from 1.2× to 3.3× and speedups of up to 28× over QUICK on one GPU and 15× using eight GPUs. Strong scaling calculations show nearly ideal scalability up to 8 GPUs while retaining high parallel efficiency for up to 18 GPUs.
Original language | English |
---|---|
Pages (from-to) | 7486-7503 |
Number of pages | 18 |
Journal | Journal of Chemical Theory and Computation |
Volume | 17 |
Issue number | 12 |
Early online date | 15 Nov 2021 |
DOIs | |
Publication status | Published - 14 Dec 2021 |
Keywords
- graphics processing units (GPUs)
- self-consistent field (SCF)
- one-electron integrals
- electron repulsion integrals
- Fock matrix