Table 2 Comparison of datasets in the top vs. bottom popularity lists

	Top datasets (n = 20)	Bottom datasets (n = 20)	p
Number of associated publications	14 (9–22)	2 (1–3)	0.000
Number of task types	2 (1–5)	1 (1–2)	0.007
Number of sub-benchmarks	2 (1–8)	1 (1–1)	0.015
Dedicated leaderboard	35%	0%	0.002
Proposed as part of competition	10%	15%	0.322
Number of institutions	2 (1–8)	1 (1–6)	0.310
First/last author affiliated with top company/university	50%	20%	0.024

Datasets were sampled from NLP and computer vision datasets with first reported results in Papers With Code in 2018. Popularity was assessed by the number of publications that report benchmark results based on a dataset and are captured in the Papers With Code repository. Numeric attributes are reported as fractions or median values printed in bold. For median values, minimum and maximum values are shown in brackets.

Quick links

Search