Table 3 Comparison between 36 OBI datasets for labels, utilities, strengths, and limitations
From: Oracle bone inscriptions information processing: a comprehensive survey
Task | Dataset | Annotation/ Label | Utility | Strengths | Limitations |
|---|---|---|---|---|---|
Recognition | *YinQiWenYuandetection | Bounding box; Binary OBI label | Multiple-character recognition on rubbings | Manually annotated; Widely-used | Noise-contaminated |
| Â | OracleBone-800033 | Bounding box | Multiple-character recognition on rubbings | Manually annotated | Noise-contaminated; Imbalanced; Close-source |
| Â | ACCID52 | Bounding box; Class Label; Structural relation | Radical and single character recognition | Fine-grained radical-level annotation | Focus on single-character; Close-source |
| Â | O2BR2 | Bounding box | Multiple-character recognition on real OBI images | Focus on original OBI; Open-source | Relatively small scale |
Rejoining | OB-Rejoin34 | Fragment-level outlines | Training data-driven rejoining model | Expert-level annotations | Low image quality; Sparse; Close-source |
| Â | COBD54 | Fragment-level outlines | Training and testing rejoining models | Include curve trajectory sequence | Relatively small scale; Close-source |
| Â | OBI-rejoin2 | Adjacency label | Testing rejoining models | Include real OBI and rubbings; Open-source | Relatively small scale; Coarse-grained label |
| Â | OBFI35 | Binary rejoinable label | Training and testing rejoining models | Large-scale; Multi-source; Open-source | Corse-grained label |
Classification & Retrieval | Oracle-20k43 | Class label | General character-level classification and retrieval | Early representative work | Severe long-tail distribution; Only handprinted |
| Â | OBC30669 | Class label | General character-level classification and retrieval Variant analysis | Largest sample size for classification; realistic noise | Severe long-tail distribution; Low image quality; Labeling errors |
| Â | Oracle-AYNU44 | Class label | General character-level classification and retrieval | Larger than Oracle-20k | Severe long-tail distribution; Low resolution; Close-source; Only handprinted OBI |
| Â | HWOBC70 | Class label | General character-level classification and retrieval | Large-scale; Widely-used; Intra-class variants | Only handprinted OBI |
| Â | Oracle-50k71 | Class label | General character-level classification and retrieval | Large-scale; Widely-used | Severe long-tail distribution; Only handprinted; Low resolution |
| Â | *OBI-IJDH | Class label | General character-level classification and retrieval | Open-source | Small scale |
| Â | Oracle-25072 | Class label | General character-level classification and retrieval Variant analysis | Diverse handprinted variants | Focus only on the top-250 most frequent characters; Close-source |
| Â | Radical-14872 | Radical label | General radical-level Classification and retrieval; Variant analysis | Diverse handprinted variants | Close-source |
| Â | OBI12573 | Class label | General character-level classification and retrieval; Variant analysis | Focus on rubbings | Severe long-tail distribution; Relatively small scale |
| Â | OBI-10030 | Class label | General character-level classification and retrieval; Variant analysis | Open-source | Limited coverage; Only handprinted; Small scale |
| Â | Oracle-24175 | Class label | Character-level classification; OBI generation; Style transfer | Multi-type; Open-source | Non-pixel-level alignment |
| Â | ORCD76 | Radical label; Bounding box | OBI radical recognition and classification | Diverse handprinted radical | Relative small scale; Close-source |
| Â | OCCD76 | Combined character structure | OBI radical detection | First Oracle combined character dataset | Close-source |
| Â | OracleRC51 | Radical- and stroke- level label | OBI classification | Fine-grained character decomposition | Long-tail distribution; Close-source |
| Â | Oracle-MNIST78 | Class label | General character-level classification and retrieval Variant analysis | Open-source; Even distribution | Low resolution; Noise-contaminated |
| Â | OBI component 2079 | Component-level label | OBI retrieval | First work for component-level annotation | Relatively small scale; Long-tail distribution |
Deciphering | OBI-ECC80 | OBI character evolution process | Modern-Chinese-based deciphering | Open-source | Relatively small scale; Only handprinted |
| Â | EVOBC81 | OBI character evolution process | Modern-Chinese-based deciphering; OBI Generation | Multi-source; Open-sourcescale; Widely-used | No explicit limitations |
| Â | HUST-OBC82 | Aligned modern Chinese | Modern-Chinese-based deciphering | Multi-source; Open-sourcescale; Widely-used | Long-tail distribution; Only handprinted |
| Â | ACCP83 | OBI character evolution process; Class label; Radical sequence | Modern-Chinese-based deciphering | Fine-grained OBI decomposition Multi-source; Open-sourcescale | Expand upon HUST-OBC and EVOBC |
| Â | OracleSem60 | Pictographic and semantic caption | Pictographic interpretations; Component analysis; Modern Chinese mapping | Fine-grained text interpretation | Close-source; Only handprinted |
| Â | GEVOBC84 | Graph-based evolution process; Class label | Character evolutionary stage alignment | Novel graph data structure | Relatively small scale; Only handprinted |
| Â | PD-OBS62 | Modern Chinese; Ancient form; Radical annotation | Pictographic interpretations; Radical analysis | Fine-grained OBI decomposition with interpretations | Only handprinted |
| Â | PictOBI-20k5 | Class label; OBC-Object pair; Reference points; Attention map | Visual decipherment; Attention consistency; LMM evaluation | LMM-oriented; Perception mechanism | Pictographic OBI only |
Emerging | RCRN85 | Synthetic noise variants | OBI denoising | Early representative work; Original-distorted image pair | Relatively small scale; Lack of real-world noise |
| Â | OBIMD86 | Bounding box; Class label; Transcriptions; Inscription groups; Reading sequences | Multiple-character recognition; Rubbing denoising; Character matching; Completion; Reading sequence prediction | Multimodal; Good applicability; Open-source | Relatively small scale |
| Â | RMOBS61 | Radical-level bounding box; Semantic concept; Key component | Semantics understanding; Visual grounding | Fine-grained character annotation | Close-source; Only handprinted OBI |
| Â | Oracle-P15k3 | Class label; Image-glyph pair | OBI generation; Long-tail distribution; Style transfer | Structure-aligned pairs for generative AI and denoising | Relatively small scale compared to recognition sets |