Fig. 7: A schematic flowchart for the link-cell parallelism algorithm for a machine learning energy model.

The outer loop over index iA is executed sequentially, while the inner loop over index iC is executed in parallel as a CUDA kernel since the MC update atoms of the same iA index but different iC indices are independent.