Table 3 Comparison between 36 OBI datasets for labels, utilities, strengths, and limitations

From: Oracle bone inscriptions information processing: a comprehensive survey

Task

Dataset

Annotation/ Label

Utility

Strengths

Limitations

Recognition

*YinQiWenYuandetection

Bounding box; Binary OBI label

Multiple-character recognition on rubbings

Manually annotated; Widely-used

Noise-contaminated

 

OracleBone-800033

Bounding box

Multiple-character recognition on rubbings

Manually annotated

Noise-contaminated; Imbalanced; Close-source

 

ACCID52

Bounding box; Class Label; Structural relation

Radical and single character recognition

Fine-grained radical-level annotation

Focus on single-character; Close-source

 

O2BR2

Bounding box

Multiple-character recognition on real OBI images

Focus on original OBI; Open-source

Relatively small scale

Rejoining

OB-Rejoin34

Fragment-level outlines

Training data-driven rejoining model

Expert-level annotations

Low image quality; Sparse; Close-source

 

COBD54

Fragment-level outlines

Training and testing rejoining models

Include curve trajectory sequence

Relatively small scale; Close-source

 

OBI-rejoin2

Adjacency label

Testing rejoining models

Include real OBI and rubbings; Open-source

Relatively small scale; Coarse-grained label

 

OBFI35

Binary rejoinable label

Training and testing rejoining models

Large-scale; Multi-source; Open-source

Corse-grained label

Classification & Retrieval

Oracle-20k43

Class label

General character-level classification and retrieval

Early representative work

Severe long-tail distribution; Only handprinted

 

OBC30669

Class label

General character-level classification and retrieval Variant analysis

Largest sample size for classification; realistic noise

Severe long-tail distribution; Low image quality; Labeling errors

 

Oracle-AYNU44

Class label

General character-level classification and retrieval

Larger than Oracle-20k

Severe long-tail distribution; Low resolution; Close-source; Only handprinted OBI

 

HWOBC70

Class label

General character-level classification and retrieval

Large-scale; Widely-used; Intra-class variants

Only handprinted OBI

 

Oracle-50k71

Class label

General character-level classification and retrieval

Large-scale; Widely-used

Severe long-tail distribution; Only handprinted; Low resolution

 

*OBI-IJDH

Class label

General character-level classification and retrieval

Open-source

Small scale

 

Oracle-25072

Class label

General character-level classification and retrieval Variant analysis

Diverse handprinted variants

Focus only on the top-250 most frequent characters; Close-source

 

Radical-14872

Radical label

General radical-level Classification and retrieval; Variant analysis

Diverse handprinted variants

Close-source

 

OBI12573

Class label

General character-level classification and retrieval; Variant analysis

Focus on rubbings

Severe long-tail distribution; Relatively small scale

 

OBI-10030

Class label

General character-level classification and retrieval; Variant analysis

Open-source

Limited coverage; Only handprinted; Small scale

 

Oracle-24175

Class label

Character-level classification; OBI generation; Style transfer

Multi-type; Open-source

Non-pixel-level alignment

 

ORCD76

Radical label; Bounding box

OBI radical recognition and classification

Diverse handprinted radical

Relative small scale; Close-source

 

OCCD76

Combined character structure

OBI radical detection

First Oracle combined character dataset

Close-source

 

OracleRC51

Radical- and stroke- level label

OBI classification

Fine-grained character decomposition

Long-tail distribution; Close-source

 

Oracle-MNIST78

Class label

General character-level classification and retrieval Variant analysis

Open-source; Even distribution

Low resolution; Noise-contaminated

 

OBI component 2079

Component-level label

OBI retrieval

First work for component-level annotation

Relatively small scale; Long-tail distribution

Deciphering

OBI-ECC80

OBI character evolution process

Modern-Chinese-based deciphering

Open-source

Relatively small scale; Only handprinted

 

EVOBC81

OBI character evolution process

Modern-Chinese-based deciphering; OBI Generation

Multi-source; Open-sourcescale; Widely-used

No explicit limitations

 

HUST-OBC82

Aligned modern Chinese

Modern-Chinese-based deciphering

Multi-source; Open-sourcescale; Widely-used

Long-tail distribution; Only handprinted

 

ACCP83

OBI character evolution process; Class label; Radical sequence

Modern-Chinese-based deciphering

Fine-grained OBI decomposition Multi-source; Open-sourcescale

Expand upon HUST-OBC and EVOBC

 

OracleSem60

Pictographic and semantic caption

Pictographic interpretations; Component analysis; Modern Chinese mapping

Fine-grained text interpretation

Close-source; Only handprinted

 

GEVOBC84

Graph-based evolution process; Class label

Character evolutionary stage alignment

Novel graph data structure

Relatively small scale; Only handprinted

 

PD-OBS62

Modern Chinese; Ancient form; Radical annotation

Pictographic interpretations; Radical analysis

Fine-grained OBI decomposition with interpretations

Only handprinted

 

PictOBI-20k5

Class label; OBC-Object pair; Reference points; Attention map

Visual decipherment; Attention consistency; LMM evaluation

LMM-oriented; Perception mechanism

Pictographic OBI only

Emerging

RCRN85

Synthetic noise variants

OBI denoising

Early representative work; Original-distorted image pair

Relatively small scale; Lack of real-world noise

 

OBIMD86

Bounding box; Class label; Transcriptions; Inscription groups; Reading sequences

Multiple-character recognition; Rubbing denoising; Character matching; Completion; Reading sequence prediction

Multimodal; Good applicability; Open-source

Relatively small scale

 

RMOBS61

Radical-level bounding box; Semantic concept; Key component

Semantics understanding; Visual grounding

Fine-grained character annotation

Close-source; Only handprinted OBI

 

Oracle-P15k3

Class label; Image-glyph pair

OBI generation; Long-tail distribution; Style transfer

Structure-aligned pairs for generative AI and denoising

Relatively small scale compared to recognition sets

  1. *The website references of YinQiWenYuandetection and OBI-IJDH are (https://jgw.aynu.edu.cn/home/down/detail/index.html?sysid=3) and (http://www.ihpc.se.ritsumei.ac.jp/OBIdataseIJDH.zip), respectively.