Table 3 Quantitative comparison of METEOR with several state-of-the-art methods (rows) across different heterogeneous Earth observation datasets (columns).

	5-Shot problem	Human influence	Crop type mapping	Land cover classification		Marine debris	Urban scenes
	Dataset	AnthPr.⁴³	DENETHOR⁴²	DFC2020-KR³⁹	EuroSAT⁴⁰	fl. obj.⁶	NWPU-Urban⁴¹
	Spatial res.	10 m	3 m	10 m	10 m	10 m	<1 m
	Spectral res.	10 bands	4 bands	13 bands	13 bands	12 bands	3 bands
	No. of classes	2	3	5	10	2	5
	No. of training imgs	10	15	25	50	10	25
	Model	Rank (↓)	Accuracy (↑)
METEOR	3.6	83.7	75.6	87.7	60.9	90.8	57.4
SWAV³⁶	4.2	96.7	69.8	54.2	67.7	65.4	70.4
MOSAIKS²⁹	4.3	86.4	76.4	82.3	57.9	88.8	54.0
DINO³⁷	5.0	91.2	66.2	56.6	61.3	65.1	70.6
SECO³⁵	4.7	91.4	61.7	67.6	62.7	65.9	67.4
SSLTRANSRS¹⁶	5.3	90.7	65.5	76.3	59.7	78.9	52.1
SSL4EO³⁴	5.5	96.2	58.0	80.2	59.1	82.4	49.9
BASELINE	6.8*	89.0	60.8	87.4	39.8	69.8	36.7
PROTO¹⁷	8.3**	59.7	56.2	76.9	46.1	67.3	39.1
IMAGENET	8.8*	83.7	59.7	50.8	42.7	64.1	60.5
SCRATCH	9.5**	64.8	61.1	66.5	25.7	64.4	32.3

This heterogeneous setting is most challenging, as each evaluated task is characterized by a different number of spectral bands, number of classes, and spatial resolution. Here, METEOR achieves the best average rank of 3.6 but is closely followed by SWAV with 4.2 and MOSAIKS with 4.3 across the evaluated datasets. Different models are optimal for different tasks, and no model dominates all tasks. This is reflected in the Wilcoxon Signed Rank test that shows that the performance of METEOR is only significantly different (indicated by * and **) from the BASELINE, PROTO, IMAGENET, SCRATCH models. Best values are highlighted by bold face.

Quick links

Search