Table 6 Ablation study on core components (zero-shot setting).

From: Visual information extraction from documents via classification-guided large vision-language models

Method

F1-score (%)

NED

Full framework (classification + ICL)

86.43

0.9012

 w/o classification (single universal prompt)

68.42

0.7312

 w/o ICL (only task definition + format)

79.65

0.8431

 w/o post-processing

84.97

0.8876

 w/o image rotation preprocessing

83.21

0.8723

Classification Option 1 (feature matching)

85.79

0.8956

Classification Option 2 (trained ConvNeXt)

86.43

0.9012