Table 1 Comparison with advanced techniques on the LLVIP dataset.

From: Multimodal fusion transformer network for multispectral pedestrian detection in low-light condition

Methods

Data

Backbone

mAP50↑

mAP↑

Halfwayfusion41

RGB + IR

VGG16

91.4

55.1

GAFF42

RGB + IR

ResNet18

94.0

55.8

ProbEn43

RGB + IR

ResNet50

93.4

51.5

CSAA44

RGB + IR

ResNet50

94.3

59.2

RSDet40

RGB + IR

ResNet50

95.8

61.3

FusionGAN45

RGB + IR

GAN

83.8

48.1

GANMcC46

RGB + IR

GAN

87.8

49.8

NestFuse47

RGB + IR

Encoder–decoder

86.9

49.7

DenseFuse48

RGB + IR

Encoder–decoder

88.2

50.4

SDNet27

RGB + IR

86.6

50.8

U2Fusion49

RGB + IR

VGG

87.1

47.6

DIVFusion50

RGB + IR

Encoder–decoder

89.8

52.0

Ours

RGB + IR

CSPDarknet53

96.4

62.7