Fig. 4: Performance of 2-GPU versus 2-CPU server. | npj Computational Materials

Fig. 4: Performance of 2-GPU versus 2-CPU server.

From: Kohn–Sham time-dependent density functional theory with Tamm–Dancoff approximation on massively parallel GPUs

Fig. 4

a Average wall-time for a single iteration in DFT and TDDFT for 5TCzBN molecule. Horizontal dotted lines indicate the ideal wall-times that would have been obtained when our GPU code exhibited an identical FLOP performance to the CPU code. b Roofline analysis of GPU kernel performance corresponding to the Fock build for a single Nvidia A100 GPU. Here, 21 compute kernels depending on the distinct combinations of angular momenta are shown. The theoretical peak performance was bound by the profiling conditions of the employed tool.

Back to article page