Table 6 Perf results of \(MPDMSort_{Lomuto}\), \(MPDMSort_{Hoare}\), BQSort, and MWSort on the Intel i7-11700 machine at n = 1000 million Uint64 data.
From: Parallel Multi-Deque Partition Dual-Deque Merge sorting algorithm using OpenMP
Algorithms | b | Cutoff | Cache misses | Branch Load misses |
|---|---|---|---|---|
\(MPDMSort_{Lomuto}\) | 2 MB | 16 MB | 2.73E+09 | 1.09E+10 |
32 MB | 2.64E+09 | 1.18E+10 | ||
64 MB | 2.69E+09 | 1.19E+10 | ||
4 MB | 16 MB | 2.84E+09 | 1.12E+10 | |
32 MB | 2.86E+09 | 1.14E+10 | ||
64 MB | 2.69E+09 | 1.23E+10 | ||
8 MB | 16 MB | 3.07E+09 | 1.12E+10 | |
32 MB | 2.99E+09 | 1.16E+10 | ||
64 MB | 2.92E+09 | 1.20E+10 | ||
\(MPDMSort_{Hoare}\) | 2 MB | 16 MB | 2.40E+09 | 1.13E+10 |
32 MB | 2.52E+09 | 1.14E+10 | ||
64 MB | 2.42E+09 | 1.20E+10 | ||
4 MB | 16 MB | 2.46E+09 | 1.09E+10 | |
32 MB | 2.47E+09 | 1.14E+10 | ||
64 MB | 2.57E+09 | 1.16E+10 | ||
8 MB | 16 MB | 2.58E+09 | 1.09E+10 | |
32 MB | 2.52E+09 | 1.13E+10 | ||
64 MB | 2.43E+09 | 1.20E+10 | ||
BQSort | 2.12E+09 | 1.26E+10 | ||
MWSort | 1.90E+09 | 8.61E+09 |