Table 5 Results of incremental pre-training of language model.

From: A large-scale dataset for Chinese historical document recognition and analysis

Training set

Before finetune

Finetuned

DaiZhiGe

31.78

38.77

HisDoc1B (Ours)

34.05

40.76