Table 1 Comparison of HisDoc1B with existing Chinese historical document datasets.
From: A large-scale dataset for Chinese historical document recognition and analysis
Dataset | #Books | #Document images | #Characters | #Character categories | Text punctuation |
---|---|---|---|---|---|
MTHv14 | — | 1,500 | 521,370 | 4,058 | × |
MTHv25 | — | 3,199 | 1,081,678 | 6,733 | × |
IC19 HDRC6 | — | 11,715 | 2,482,994 | 8,353 | × |
M5HisDoc7 | — | 8,000 | 4,367,360 | 16,151 | × |
CASIA-AHCDB3 | — | — | 2,276,740 | 10,350 | × |
HisDoc1B8 (Ours) | 40,281 | 3,163,330 (270×) | 1,082,544,808 (248×) | 30,615 (1.9×) | ✓ |