Fig. 1: Massively parallel GPU computing.

a Hierarchical structure of compute servers. Theoretical bidirectional data transfer rates within the system are listed. b Hybrid MPI-OpenMP-CUDA parallel model for the distributed GPU computing. Each process is color-coded according to the hardware used, as illustrated in a.