Fig. 6: Schematic figure of calculation time based on pipeline design when one processing core a or multiple processing cores b are used in MPU.
From: Accurate and efficient molecular dynamics based on machine learning and non von Neumann architecture

The tS1, tS2, tS3, tS4, tS5, tS6, and tS7 are the calculation time of S1, S2, S3, S4, S5, S6, and S7, respectively, in Fig. 5. Here, tSPU = tS3 + tS4 + tS5; and tWT is the waiting time; ‘#A(B)’ stands for the sub-domain B processed by the core #A of MPU; Np is the number of cores in MPU; NSD is the number of sub-domains decomposed according to the SPU capacity. Enable signals are used to coordinate the calling of SPU.