Table 3 The test results (%) of the experiments conducted using ResNet50, Swin Transformer, and EfficientNetV2-M as backbones are obtained at data volumes of 1/2, 1/4, 1/8, 1/12, and 1/16.

From: Grounded situation recognition under data scarcity

Dataset

Backbone

Top-1 predicted verb

Top-5 predicted verbs

Ground-truth verb

Verb

Value

Value-all

Grnd value

Grnd value-all

Verb

Value

Value-all

Grnd value

Grnd value-all

Value

Value-all

Grnd value

Grnd value-all

1/2

ResNet50

37.71

29.52

17.43

23.55

9.42

66.15

50.48

28.07

39.97

14.78

72.74

36.92

56.87

19.04

Swin Transformer

41.88

33.44

20.44

27.09

11.4

70.31

54.96

32.18

44.28

17.73

75.09

40.6

59.72

21.85

EfficientNetV2-M

42.38

33.19

19.42

23.51

7.71

70.82

54.22

30.23

37.96

11.73

73.49

37.79

50.94

14.44

1/4

ResNet50

30.49

23.07

12.72

17.79

6.26

57.75

42.63

22.24

32.59

10.67

69.19

31.74

52.07

14.79

Swin Transformer

36.07

27.75

15.92

21.98

8.41

63.09

47.4

25.61

37.06

13.1

71.32

34.75

55.02

17.39

EfficientNetV2-M

37.65

29.03

16.58

20.45

6.6

64.92

49.13

26.81

34.13

10.18

71.98

35.63

49.43

13.36

1/8

ResNet50

23.77

17.43

9.11

12.92

4.04

48.08

34.03

16.49

24.92

7.08

65.22

26.77

47.23

11.39

Swin Transformer

29.28

21.99

11.72

16.31

5.18

54.66

39.54

19.8

29.07

8.52

67.48

29.58

49.07

12.59

EfficientNetV2-M

30.51

22.93

12.27

15.8

4.85

56.14

40.83

20.6

27.73

7.8

68.11

30.11

47.86

11.56

1/12

ResNet50

19.8

14.23

7.33

9.59

2.62

42.06

28.88

13.42

19.17

4.66

62.55

23.98

41.1

8.6

Swin Transformer

25.23

18.44

9.54

12.35

3.38

48.57

34.25

16.41

22.66

5.69

64.94

26.72

42.78

9.65

EfficientNetV2-M

26.54

19.69

10.4

13.23

3.75

50.19

35.88

17.58

24

6.25

65.8

27.38

43.61

9.82

1/16

ResNet50

16.85

12.01

6.13

8.02

2.16

37.64

25.44

11.45

16.75

3.93

60.82

21.87

39.49

7.82

Swin Transformer

22.57

16.51

8.65

10.87

2.79

45

31.31

14.69

20.52

4.88

63.33

24.31

41.4

8.57

EfficientNetV2-M

23.5

17.15

8.97

11.4

3.13

45.75

32.1

15.29

21.12

5.37

63.91

25.14

41.69

9.02