Table 6 Perf results of \(MPDMSort_{Lomuto}\), \(MPDMSort_{Hoare}\), BQSort, and MWSort on the Intel i7-11700 machine at n = 1000 million Uint64 data.

From: Parallel Multi-Deque Partition Dual-Deque Merge sorting algorithm using OpenMP

Algorithms

b

Cutoff

Cache misses

Branch Load misses

\(MPDMSort_{Lomuto}\)

2 MB

16 MB

2.73E+09

1.09E+10

32 MB

2.64E+09

1.18E+10

64 MB

2.69E+09

1.19E+10

4 MB

16 MB

2.84E+09

1.12E+10

32 MB

2.86E+09

1.14E+10

64 MB

2.69E+09

1.23E+10

8 MB

16 MB

3.07E+09

1.12E+10

32 MB

2.99E+09

1.16E+10

64 MB

2.92E+09

1.20E+10

\(MPDMSort_{Hoare}\)

2 MB

16 MB

2.40E+09

1.13E+10

32 MB

2.52E+09

1.14E+10

64 MB

2.42E+09

1.20E+10

4 MB

16 MB

2.46E+09

1.09E+10

32 MB

2.47E+09

1.14E+10

64 MB

2.57E+09

1.16E+10

8 MB

16 MB

2.58E+09

1.09E+10

32 MB

2.52E+09

1.13E+10

64 MB

2.43E+09

1.20E+10

BQSort

  

2.12E+09

1.26E+10

MWSort

  

1.90E+09

8.61E+09