Table 4 Time taken to perform left joins on an artificial dataset.

From: Accessible data curation and analytics for international-scale citizen science datasets

Right row count

Artificial left join time (seconds)

ExeTera

Pandas

Dask

PostgreSQL

1,000,000

0.594

0.174

0.488

0.697

2,000,000

0.662

0.278

1.22

1.47

3,000,000

0.744

0.362

1.76

2.22

4,000,000

0.805

0.450

2.11

2.89

6,000,000

0.946

0.675

3.02

4.51

8,000,000

1.09

0.872

4.29

6.05

10,000,000

1.21

1.06

5.28

7.40

20,000,000

1.88

2.09

12.71

14.41

30,000,000

2.54

3.24

20.92

21.81

40,000,000

3.25

4.36

30.62

32.61

60,000,000

4.55

6.56

54.62

48.09

80,000,000

5.88

9.19

Failed

70.83

100,000,000

7.01

11.93

113.76

87.04

200,000,000

13.60

24.66

Failed

198.87

300,000,000

19.68

Memory

Failed

280.83

400,000,000

27.52

Memory

Failed

379.4

600,000,000

39.86

Memory

Failed

578.19

800,000,000

53.74

Memory

Failed

767.43

1,000,000,000

69.19

Memory

Failed

964.67

  1. Row counts shown are for the right table, which has 10x the row count of the left table (e.g. 100,000,000 rows in the left table when the right table has 1,000,000,000 rows). Memory denotes that the import was unable to succeed as it required more than 32 GB of memory. Failed denotes that the operation was unable to complete due to reasons other than memory. Figures in bold indicate the best import time.