Table 6 Ablation study on core components (zero-shot setting).
From: Visual information extraction from documents via classification-guided large vision-language models
Method | F1-score (%) | NED |
|---|---|---|
Full framework (classification + ICL) | 86.43 | 0.9012 |
 w/o classification (single universal prompt) | 68.42 | 0.7312 |
 w/o ICL (only task definition + format) | 79.65 | 0.8431 |
 w/o post-processing | 84.97 | 0.8876 |
 w/o image rotation preprocessing | 83.21 | 0.8723 |
Classification Option 1 (feature matching) | 85.79 | 0.8956 |
Classification Option 2 (trained ConvNeXt) | 86.43 | 0.9012 |