Table 3 Quantitative comparison of METEOR with several state-of-the-art methods (rows) across different heterogeneous Earth observation datasets (columns).

From: Meta-learning to address diverse Earth observation problems across resolutions

 

5-Shot problem

Human influence

Crop type mapping

Land cover classification

Marine debris

Urban scenes

 

Dataset

AnthPr.43

DENETHOR42

DFC2020-KR39

EuroSAT40

fl. obj.6

NWPU-Urban41

 

Spatial res.

10 m

3 m

10 m

10 m

10 m

<1 m

 

Spectral res.

10 bands

4 bands

13 bands

13 bands

12 bands

3 bands

 

No. of classes

2

3

5

10

2

5

 

No. of training imgs

10

15

25

50

10

25

 

Model

Rank (↓)

Accuracy (↑)

METEOR

3.6

83.7

75.6

87.7

60.9

90.8

57.4

SWAV36

4.2

96.7

69.8

54.2

67.7

65.4

70.4

MOSAIKS29

4.3

86.4

76.4

82.3

57.9

88.8

54.0

DINO37

5.0

91.2

66.2

56.6

61.3

65.1

70.6

SECO35

4.7

91.4

61.7

67.6

62.7

65.9

67.4

SSLTRANSRS16

5.3

90.7

65.5

76.3

59.7

78.9

52.1

SSL4EO34

5.5

96.2

58.0

80.2

59.1

82.4

49.9

BASELINE

6.8*

89.0

60.8

87.4

39.8

69.8

36.7

PROTO17

8.3**

59.7

56.2

76.9

46.1

67.3

39.1

IMAGENET

8.8*

83.7

59.7

50.8

42.7

64.1

60.5

SCRATCH

9.5**

64.8

61.1

66.5

25.7

64.4

32.3

  1. This heterogeneous setting is most challenging, as each evaluated task is characterized by a different number of spectral bands, number of classes, and spatial resolution. Here, METEOR achieves the best average rank of 3.6 but is closely followed by SWAV with 4.2 and MOSAIKS with 4.3 across the evaluated datasets. Different models are optimal for different tasks, and no model dominates all tasks. This is reflected in the Wilcoxon Signed Rank test that shows that the performance of METEOR is only significantly different (indicated by * and **) from the BASELINE, PROTO, IMAGENET, SCRATCH models. Best values are highlighted by bold face.