Table 2 Comparison of datasets in the top vs. bottom popularity lists

From: Mapping global dynamics of benchmark creation and saturation in artificial intelligence

 

Top datasets (n = 20)

Bottom datasets (n = 20)

p

Number of associated publications

14 (9–22)

2 (1–3)

0.000

Number of task types

2 (1–5)

1 (1–2)

0.007

Number of sub-benchmarks

2 (1–8)

1 (1–1)

0.015

Dedicated leaderboard

35%

0%

0.002

Proposed as part of competition

10%

15%

0.322

Number of institutions

2 (1–8)

1 (1–6)

0.310

First/last author affiliated with top company/university

50%

20%

0.024

  1. Datasets were sampled from NLP and computer vision datasets with first reported results in Papers With Code in 2018. Popularity was assessed by the number of publications that report benchmark results based on a dataset and are captured in the Papers With Code repository. Numeric attributes are reported as fractions or median values printed in bold. For median values, minimum and maximum values are shown in brackets.