Table 3 Population model validation performance for (top) manually and automatically extracted features, along with (bottom) test performance of existing maps. See section “Results” for metric definitions. IQR is interquartile range of absolute errors. Bold indicates best in category. FT fine-tuning.

Features used	\(R^2\)	\({\textsc {MeAPE}}\)	\({\textsc {MeAE}}\) (%)	\({\textsc {IQR}}\)	\({\textsc {AggPE}}\) (%)
Hand-crafted features
Public only	− 0.22	57.8	5.05	8.60	21.5
Footprint only	0.47	44.5	3.75	5.86	02.2
Public + Footprint	0.46	48.8	4.63	5.11	07.6
Representation learning
Supervised	0.20	54.7	5.36	7.97	02.0
Supervised (FT)	0.33	52.9	4.72	6.23	05.5
SWAV	0.34	51.6	6.60	4.33	00.8
SWAV (FT)	0.41	46.9	5.83	4.35	03.5
DeepCluster	0.26	50.3	4.60	6.03	06.7
DeepCluster (FT)	0.13	62.5	5.98	8.32	06.8
Barlow Twins	0.27	51.9	5.40	6.65	02.8
Barlow Twins (FT)	0.39	44.0	3.91	6.32	01.1
Null model
None	− 0.12	76.45	7.57	10.0	01.7
Existing maps
GRID3	0.22	51.7	4.25	7.11	26.7
HRSL	− 0.12	70.7	5.04	7.94	46.8
WorldPop	− 0.41	86.8	5.85	8.18	77.9

Quick links

Search