Fig. 9: Profiled timelines for a single-time step of unbiased and biased execution with HOOMD-blue and OpenMM.
From: PySAGES: flexible, advanced sampling methods accelerated with GPUs

The profiles were recorded with Nvidia Nsight systems on an Nvidia V100 GPU. Light-blue represents the GPU activity while dark-blue represents individual CUDA compute kernels. The numbered lines indicate the same compute steps in all simulations: (1) start of integration step, (2) compute of bond forces, (3) pair-wise forces, (4) calculation of the CV, (5) addition of the harmonic biasing force to the backend, and (6) end of the integration step. (4) and (5) are PySAGES only and are executed on the GPU. We observe GPU idle time during the PySAGES Python coordination with GPU--JAX/CuPy (green bar), but note that there is no memory copies even within the GPU memory. a 1.8 ms of recorded HOOMD-blue execution. The top row shows a vanilla HOOMD-blue simulation step, while the bottom row shows a PySAGES/HOOMD-blue simulation with harmonic biasing of a center of mass CV. The additional time for CV biasing per time step is 247 μs. b 1.6 ms recorded execution time line of an OpenMM OPLS simulation of 40,981 particles as polymers with PME summation for long-range Coulomb forces. OpenMM works with asynchronous GPU kernel execution, which leads to less linearly-sorted timelines. Overall, the performance degradation is more pronounced with OpenMM compared to HOOMD-blue.