Introduction

The Qin and Han bamboo slips refer to a form of writing medium and document format used during the Qin and Han dynasties (221 BCE to 220 CE)1,2. Composed primarily of bamboo slips, they served as critical tools for recording text and information of the time. These artifacts reflect various aspects of society, including politics, economy, culture, and legal systems, making them invaluable physical evidence for studying ancient Chinese history. Moreover, they are invaluable parts of the world’s cultural heritage (Fig.1).

Fig. 1: Qin and Han Dynasty bamboo slips.
figure 1

Typical Qin and Han Dynasty bamboo slips under different degradation conditions.

A crucial and effective approach to understanding and preserving these documents is digitization, which systematically transforms the textual and graphical symbols from physical manuscripts into digital records. The quality of this process heavily depends on the accuracy of character restoration from the slips3,4. However, the digitization and faithful restoration of Qin and Han bamboo slips pose significant challenges. These challenges primarily stem from varying degrees of degradation and the inherent complexity of ancient characters. Factors such as diverse types and levels of noise, fading, stains, and intensity variations complicate information extraction, making the restoration process both demanding and intricate5.

Although over 200,000 bamboo slips have been unearthed in China to date, only a small proportion of them have been digitized. Moreover, these digitization efforts are largely limited to photographic archiving, with virtually no systematic organization or publicly available datasets for use by researchers in character recognition. As a result, studies focusing on the restoration and recognition of characters on bamboo slips remain scarce.

The restoration of Qin and Han bamboo slip characters poses greater challenges compared to other Chinese ancient texts written after the invention of paper. These challenges are primarily reflected in two aspects:

Lack of suitable datasets for automatic character recognition and the difficulty of dataset construction. While some datasets exist for use in humanities and historical research, they are not applicable to automatic character recognition or deep learning tasks. Furthermore, constructing datasets that meet the requirements for such applications demands a high level of expertise in ancient Chinese characters to ensure the accuracy of restored text within the datasets6,7. This requirement further exacerbates the scarcity of usable databases.

Severe and highly variable text degradation due to time and material constraints. Qin and Han bamboo slip texts were inscribed approximately 2000 years ago using ink on bamboo or wooden surfaces. By the time these artifacts were unearthed in modern times, most had suffered extensive damage. The characters are often incomplete, and the backgrounds vary significantly due to material corrosion and other degradation factors, making restoration exceedingly complex.

To address the aforementioned challenges, this study, with the assistance of experts in ancient Chinese scripts, constructed a dataset suitable for automatic character recognition. Building on this dataset, a restoration method for Qin and Han bamboo slip characters is proposed, utilizing an improved conditional generative adversarial network (GAN) combined with an effective character contour length (ECCL) constraint. This approach achieves effective restoration of characters with complex backgrounds. The main contributions of this study are as follows:

Construction of a paired dataset of Qin and Han bamboo Characters for the first time. A dataset was created consisting of original bamboo slip characters paired with corresponding ground-truth characters handwritten by ancient script experts. This paired structure dataset can support training across a range of machine learning and deep learning methods.

Proposal of an improved conditional GAN model for Qin and Han bamboo characters restoration. An improved pixel-to-pixel conditional GAN model is proposed to enable effective restoration of degraded bamboo slip characters.

Introduction of the ECCL constraint in the loss function. By incorporating the ECCL constraint into the loss function, the issue of excessive small-area noise in generated images is addressed, enhancing the authenticity of the restored characters.

Methods

Related work

There has been notable progress in the restoration and recognition of similar ancient manuscripts and handwritten texts by scholars worldwide. Some of these methods provide valuable insights for this research.

Image denoising and restoration techniques of historical documents aim to remove noise from images and accurately fill missing regions with appropriate content. These methods can be broadly categorized into two types: traditional image processing approaches and learning-based approaches.

Traditional image processing approaches: ancient document denoising and restoration methods based on traditional image processing methods can be divided into two categories: based on character segmentation and based on preprocessing and binarization.

Methods based on character segmentation: Baig et al.8 presented a segmentation approach based on the response of Harris corner detectors in the task of digital paleography. Panichkriangkrai et al.9 proposed a system that segments characters in Japanese historical books by applying region labeling of connected components, followed by rule-based integration. Tian et al.10 proposed a method based on interval-valued hesitant fuzzy set to address the over-segmentation of character components separation of ancient Chinese books.

Methods based on preprocessing and binarization: Chen and Wang11 extended the non-local means method and used it to remove noises from the input broken and degraded document image in the step of pre-process. Shirai et al.12 proposed a method which performs anisotropic morphological dilation via implicit smoothing for the purpose of restoring the degraded character shapes of binarized images of wooden tablets Mokkan, which is similar to bamboo slips. Huang et al.13 proposed a historic document image threshold based on a combination of Bradley’s algorithm and K-means. Sachin Bha et al.14 presented a new binarization and post-processing technique to efficiently extract and restore the foreground character from heavily degraded documents. Samosir15 used a morphological approach to improve image quality after restoration using a shift filtering algorithm. To deal with occluded character restoration, Cohen et al.16 used the broken Hebrew character’s active contour as a prior model to restore Hebrew characters occluded by Arabic characters in palimpsests. Huang et al.17 presented a comprehensive comparative study of image denoising techniques relying on anisotropic diffusion filter, Wiener filter, total variation, non-local means, and bilateral filtering for Chinese calligraphy images. To address the over-segmentation of character components separation of ancient Chinese books, Jia18 utilized the structural symmetric pixels (SSPs) to calculate the local threshold in neighborhood and the voting result of multiple thresholds determined whether one pixel belongs to the foreground or not. Our previous work19 proposed an auto focus effective character contour length method (AF-ECCL) to binarize Qin-Han bamboo slips characters.

Most traditional image processing methods are only capable of handling denoising and restoration for ancient manuscripts with relatively simple backgrounds and minimal text degradation.

Deep learning-based approaches: most of the deep learning approaches used in ancient manuscripts restoration are based on Generative Adversarial Networks (GANs), many scholars have proposed improved GAN-based algorithms for different ancient manuscripts and have achieved many results. Watanabe et al.20 proposed a character segmentation method using a fully convolutional network (FCN) and a postprocessing phase to segment the documents of the office of the Governor-General of Taiwan (GGT) as one sample of modern Japanese official documents. Gonwirat et al.21 proposed a deblur generative adversarial network (DeblurGAN) for denoising and recognition of handwritten characters. Zhang et al.22 proposed a GAN denoising method to denoise and restore historical Chinese calligraphic images (CID-GAN). Wang et al.23 also proposed a similar GAN-based model to denoise the Chinese stele and rubbing calligraphic image. Su et al. suggested a new method for restoring ancient Chinese characters with dual generative adversarial networks (GANs). Souibgui and Kessentini24 suggested Document Enhancement Generative Adversarial Networks (DE-GAN), an efficient end-to-end system that uses conditional GANs (cGANs) to fix badly damaged document images. However, this method is mainly used for image binarization and enhancement and is not effective for the restoration of degraded images. Zheng et al.25 used an example attention generative adversarial network (EA-GAN), which fused reference instances, and was a two-branch structure character restoration network, to correctly repaired the damaged character. Nguyen et al.26 proposed a character attention generative adversarial network (CAGAN) for restoring heavily damaged character patterns in old Japanese Kanji document. Neji et al.27 proposed a Doc-Attentive GAN for historical document denoising, and tested with DIBCO datasets, Arabic historical documents. In the field of image-to-image generation, CycleGAN28 and pix2pix29 are two widely applied algorithms that have achieved successful applications in generating text for many other languages. Dutta et al.30 used CycleGAN for terahertz image denoising in nondestructive historical document analysis. Although deep learning-based approaches, particularly those leveraging Generative Adversarial Networks (GANs), have been explored to some extent in the restoration of various ancient manuscript texts, no research to date has specifically focused on the restoration of Chinese Qin and Han bamboo slip characters.

Transformer31 has also been introduced for image restoration in recent years. Chen et al.32 proposed a backbone model IPT for various restoration problems based on Transformer. Cao et al.33 proposed VSR-Transformer, which used self-attention for better feature fusion in video super-resolution, which can be used for image restoration. Liang et al.34 proposed a strong baseline model named SwinIR for image restoration based on Swin Transformer. Most Transformer-based methods are applied to natural color image restoration, and there are very few literatures using this method to restore ancient manuscripts.

Proposed method

The overall framework of the proposed method is illustrated in Fig. 2. The model consists of three main components: the generator, the discriminator, and the loss function calculation. Specifically, the generator takes a degraded image of a bamboo slip character as input and outputs the restored character. The discriminator is responsible for distinguishing between real and generated data pairs, while the loss function defines the objective and guides the optimization direction of the model.

Fig. 2: Overview of the proposed method.
figure 2

The structure and workflow of the proposed character restoration method based on improved conditional generative adversarial networks.

We treat the denoising and restoration of bamboo slip characters as an image-to-image translation task, where the objective is to generate clean and complete text images by restoring degraded and noisy input images. Since GANs outperform autoencoders in generating high-fidelity samples, and paired data is used in this study, diffusion models are not a suitable choice. Thus, a model based on cGANs is a more appropriate option.

GANs consist of two neural networks: a generator G and a discriminator D. The generator aims to learn the mapping from a random noise vector \(z\) to an image \(y\), while the discriminator’s role is to distinguish between images generated by G and real images. Given \(y\), D outputs a probability to indicate whether \(y\) is real or fake. The generator’s goal is to deceive the discriminator by producing images that closely resemble the real data, while the discriminator continuously improves its ability to predict whether an image is fake. This adversarial learning framework drives the optimization process.

cGANs follow the same adversarial learning process but introduce an additional parameter x, which serves as a conditional input. This makes the generation process controllable. In the context of this study, the conditional input x corresponds to the degraded original bamboo slip character image, guiding the generator to produce restored outputs conditioned on the degraded input.

The generator performs the task of image-to-image translation. Typically, autoencoder-based models are employed to address this type of problem. These models consist primarily of a series of convolutional layers, referred to as the encoder, which perform down-sampling before a specific layer. This process is then reversed through a series of up-sampling and convolutional layers, known as the decoder.

In this study, the encoder comprises an input convolutional layer followed by two down-sampling convolutional layers, while the decoder consists of two up-sampling convolutional layers and one output convolutional layer. The intermediate bottleneck layer employs nine ResNet-based modules to facilitate more effective feature extraction, as illustrated in Fig. 3.

Fig. 3: Generator of the proposed method.
figure 3

The structure and workflow of the generator of our method, which consists of three parts: encoder, bottleneck and decoder.

This architecture is similar to UNet but does not include skip connections between the encoder and decoder. This design choice stems from the observation that bamboo slip text images are small in size, with relatively simple content and uniform colors. Incorporating skip connections in this context led to a slight degradation in the quality of the generated images.

The discriminator is a simple fully convolutional network. It takes a pair of image data as input and outputs a two-dimensional probability matrix, where each value represents the likelihood of each pixel in the image belonging to a real image or a generated (fake) image. The input data pair can either consist of the original noisy image paired with the ground truth image, or the original noisy image paired with the generated image, as seen in Fig. 4. In the former case (noisy image and ground truth), the output probability matrix is compared against a label matrix filled with ones (indicating all real) to compute the loss. In the latter case (noisy image and generated image), the output is compared against a label matrix filled with zeros (indicating all fake) to compute the loss.

Fig. 4: Discriminator of the proposed method.
figure 4

The structure and workflow of the discriminator of our method, which takes a pair of image data as input and outputs a two-dimensional probability matrix.

Since the objective is to perform pixel-level discrimination, the convolutional layers use a kernel size of 1 × 1, enabling more effective fusion of pixel-level information.

The loss function for image generation in GAN models typically consists of two components: the generator loss (G loss) and the discriminator loss (D loss). The G loss evaluates the difference between the generated images and the real images, while the D loss measures the discriminator’s effectiveness in distinguishing between generated and real images.

In this study, the generator loss comprises three components: the classical GAN generator loss \({\mathrm{Loss}}_{{G\_gen}}\), the L1 loss \({\mathrm{Loss}}_{{G\_L}1}\), which measures the similarity between the generated and ground truth images, the effective character contour length loss \({\mathrm{Loss}}_{{G\_ECCL}}\), which evaluates the quality of character contour preservation.

The L1 loss is defined as shown in Eq. (1), which quantifies the average pixel-level discrepancy between the generated image and the ground truth image.

$${\mathrm{Loss}}_{{G}_{L1}}=\frac{1}{N}\mathop{\sum }\limits_{i=1}^{N}|{R}_{i}-{G}_{i}|$$
(1)

Most GAN-based image generation methods typically consider only the \({\mathrm{Loss}}_{{G\_gen}}\) and \({\mathrm{Loss}}_{{G\_L}1}\) components in their loss functions. However, this is insufficient for tasks like the restoration of bamboo slip text. Consider the scenario shown in Fig. 5: the black text represents the ground truth image, and the red dots indicate errors introduced by the generator (these dots are black in actual outputs but are displayed as red here for illustration purposes). In the two cases shown, (a) and (b), the number of erroneous points is identical, but their positions differ. In case (a), the errors are located in the blank areas outside the text, while in case (b), the errors are adjacent to the target text. From the perspective of L1 loss, the calculated loss for these two scenarios would be identical. However, from a human perception standpoint, the two are clearly different: (a) appears to add extraneous noise, while (b) merely results in slightly thicker strokes at certain points, which has minimal impact on the overall structure of the text.

Fig. 5: Examples where using L1 loss is not suitable.
figure 5

A concrete example of why L1 loss is inappropriate, where the black text represents the ground truth image, and the red dots indicate errors introduced by the generator. a The errors are located in the blank areas outside the character, b the errors are adjacent to the target character.

Therefore, for the restoration of bamboo slip text, an appropriate evaluation function to characterize “whether the character has been clearly restored” is essential. To restore character under uncertain background noise, two important factors need to be considered: first, to restore the contours of the character as accurately as possible, and second, to minimize background noise. Based on these objectives, we adopted the Effective Character Contour Length (ECCL) evaluation function, which was proposed in our previous work. The ECCL evaluation function is expressed by the following formula:

$$ECCL=\frac{CL}{NU{M}_{total}}$$
(2)
$$CL=\mathop{\sum }\limits_{i=1}^{N}{C}_{i}\,if\,{C}_{i} > {L}_{T}$$
(3)
$${C}_{i}=\mathop{\sum }\limits_{i=1}^{N-1}\sqrt{{{({x}_{i+1}-{x}_{i})}^{2}+({y}_{i+1}-{y}_{i})}^{2}}$$
(4)
$$NU{M}_{total}=NU{M}_{long}+NU{M}_{short}$$
(5)

\(\mathrm{CL}\) represents the total length of all long contours, where contours longer than \({L}_{T}\) are considered long contours, \(({x}_{i+1},{y}_{i+1})\) are consecutive points in the contour. In our method, contours are defined by joining all continuous points along the boundary of a binarized image, as a result, the contour length refers to the total continuous points in a single contour. \({\mathrm{NUM}}_{\mathrm{total}}\) represents the total number of contours, including both long and short contours. Under this metric, if an image contains a large number of small noise points, the ECCL value will decrease, while an image with longer complete contours will have a higher ECCL value.

The following example provides a more intuitive understanding of the correlation between ECCL and text integrity, as seen in Fig. 6. By performing binarization on an input bamboo slip text image at different threshold values, a set of images is generated. The ECCL values of these images are then calculated, and it can be observed that higher ECCL values correlate with better restoration of the text.

Fig. 6: Example of different ECCL value and corresponding character.
figure 6

Different character forms of the same character under different ECCL values, and the corresponding ECCL curves.

Based on the above discussion, in the generator loss of this study, the ECCL loss is incorporated, as shown in Eq. (6):

$$Los{s}_{{G}_{{ECCL}}}=\frac{|ECC{L}_{R}-ECC{L}_{G}|}{ECC{L}_{R}}$$
(6)

Where \({\mathrm{ECCL}}_{R}\) represents the ECCL value of the restored image, and \({\mathrm{ECCL}}_{G}\) represents the ECCL value of the ground truth image. A smaller \({\mathrm{Loss}}_{{G\_ECCL}}\) indicates that the contour integrity of the generated image is more similar to that of the ground truth image.

Therefore, the complete generator loss can be expressed as:

$$Los{s}_{G}={{\lambda }_{1}Loss}_{{G}_{{gen}}}+{\lambda }_{2}Los{s}_{{G}_{L1}}+{{\lambda }_{3}Loss}_{{G}_{{ECCL}}}$$
(7)

Where \({\lambda }_{1},{\lambda }_{2}\), and \({\lambda }_{3}\) are coefficients used to adjust the weights of the three losses in the overall generator loss.

The discriminator loss consists of two components: \({\mathrm{Loss}}_{{D}_{{real}}}\) and \({\mathrm{Loss}}_{{D}_{{gen}}}\). \({\mathrm{Loss}}_{{D}_{{real}}}\) represents the proximity between the discriminator’s output when given the real image and the label, and by optimizing this loss term, the discriminator can learn to recognize real images and output 1 correctly. On the other hand, \({\mathrm{Loss}}_{{D}_{{gen}}}\) represents the proximity between the discriminator’s output when given the generated image and the label, and by optimizing this loss term, the discriminator can learn to recognize generated images and output 0 correctly. The total discriminator loss is the sum of these two terms, averaged.

$$Los{s}_{D}=\frac{Los{s}_{{D}_{{real}}}+Los{s}_{{D}_{{gen}}}}{2}$$
(8)

Finally, the total loss of the whole model is the sum of generator loss and discriminator loss:

$$Loss=Los{s}_{{G}}+Los{s}_{D}$$
(9)

Results

Dataset

Due to the current lack of datasets specifically for the automatic restoration and recognition of bamboo slip character, the authors of this study, in collaboration with experts in ancient Chinese script, have established a Qin-Han bamboo slip character dataset. The original characters in this dataset are derived from the Shuihudi Qin Slips (睡虎地秦简), which were written during the late Warring States period to the reign of Emperor Qin Shi Huang (around 220 BCE). The dataset currently contains a total of 500 original images and their corresponding 500 ground truth images. A total of 400 pairs of images are used as training set, 50 pairs of images are used as validation set, while the rest 50 pairs of images are used as test set. Each image contains a single character, with a resolution of 64 × 64 pixels, as shown in Fig. 7. The ground truth images were manually restored, pixel by pixel, under the guidance of experts in ancient scripts. This process is time-consuming, and the dataset is still gradually expanding.

Fig. 7: Examples of the dataset.
figure 7

Ten typical characters of the dataset used in this paper, the row of “Original” represents original character under different degradation conditions, the row of “Ground Truth” represents the corresponding ground truth character.

As shown in Fig. 7, there are significant differences between the original images, mainly in terms of the degree of text degradation, background color, and background noise. These variations pose a significant challenge for training and restoration.

Evaluation metrics

To accurately and objectively evaluate the performance of various methods for bamboo slip text restoration, this study uses three evaluation metrics: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Metric (SSIM), and Effective Character Contour Length ratio (ECCL Ratio). The ECCL ratio has been discussed earlier, as shown in Eq. (6), which indicates how close the ECCL value of generated character with that of the ground truth.

The peak signal-to-noise ratio (PSNR) between two images is used to compare the original picture quality to the transformed image quality. The PSNR increases with the quality of the compressed or reconstructed image.

$$PSNR=10\,{\rm{l}}{\rm{o}}{\rm{g}}\left(\frac{Ma{x}^{2}}{MSE}\right)$$
(10)

Where \(\mathrm{Max}\) represents the maximum value of the image pixels, and MSE refers to the Mean Squared Error between the generated image and the ground truth image:

$$MSE\left(R,G\right)=\frac{1}{{mn}}\mathop{\sum }\limits_{i=1}^{m}\mathop{\sum }\limits_{j=1}^{n}{\left(R\left(i,j\right)-G\left(i,j\right)\right)}^{2}$$
(11)

In order to measure the similarity between two digital images or videos, the structural similarity index metric is used. The SSIM index can provide precise information on how closely the generated image is to its ground truth.

$$SSIM\left(R,G\right)=\frac{\left(2{\mu }_{R}{\mu }_{G}+{c}_{1}\right)\left(2{\sigma }_{R}{\sigma }_{G}+{c}_{2}\right)}{\left({\mu }_{R}^{2}+{\mu }_{G}^{2}+{c}_{1}\right)\left({\sigma }_{R}^{2}{\sigma }_{G}^{2}+{c}_{2}\right)}$$
(12)

where \({\mu }_{R}\) and \({\mu }_{G}\) are pixel mean values of images R and G, \({\sigma }_{R}^{2}\) and \({\sigma }_{G}^{2}\) are their variance respectively. In order to stabilize the divisions with weak denominators, two variables c1 and c2 are used.

Experimental setup

The training and testing of the proposed method were conducted on a GeForce RX4060Ti GPU. After training for 700 epochs, the model converged, with a total training time of approximately 2 h. During training, the threshold parameter \({L}_{T}\) for ECCL was set to 150, and the weights for the generator loss were set as \({\lambda }_{1}=1\),\({\lambda }_{2}=1,{\lambda }_{3}=0.2\). These parameters were determined through experimental comparison.

To validate the effectiveness of the proposed method, a total of five other methods were compared in the experiments: Auto Focus ECCL (AF-ECCL19), CycleGAN28, pix2pix29, CID-GAN22, and Swin Transformer34.

Qualitative evaluation and comparison

For the same test dataset, the restoration results of the original image, ground truth image, the proposed method, and the four comparison methods are shown in Figs. 8 and 9. From the comparison results, it can be observed that AF-ECCL, being fundamentally a traditional image processing-based denoising restoration method, can only achieve a certain degree of background denoising but is almost ineffective at restoring missing parts. The remaining four methods are all deep learning-based, using GANs. CycleGAN and Swin Transformer, while producing fewer small noise points, exhibit some distortion in the restoration of the text. Pix2pix and CID-GAN both offer good overall restoration results, but they suffer from issues such as small noise points and a lack of continuity in the character contours. In contrast, the proposed method outperforms the others in terms of restoring missing parts, reducing small noise, and maintaining the overall integrity of character contours.

Fig. 8: Experimental results of different methods.
figure 8

Comparison of the first set of experimental results between the proposed method and five other different methods.

Fig. 9: Experimental results of different methods.
figure 9

Comparison of the second set of experimental results between the proposed method and five other different methods.

By carefully examining the images generated by various methods, it becomes evident that the proposed method, with the added ECCL constraint, plays a significant role in suppressing small background noise points and enhancing the continuity of the contours. As shown in Fig. 10, the red boxes highlight small spots or disconnected areas in the images generated by other methods, which do not belong to the character itself. These areas may not have a noticeable negative impact on general image generation tasks, but for character restoration, a task that demands accuracy and completeness in the integrity of the character, these issues greatly affect subjective evaluations. In contrast, the proposed method shows noticeable improvements in these details.

Fig. 10: Detail comparison of two characters.
figure 10

Comparison of experimental results of the proposed method and five other different methods on two typical characters. The red boxes indicate where errors exist.

In order to verify the effectiveness of the proposed method in restoring bamboo slips under different degradation levels, we selected 19 images of the same text under different degradation conditions as test samples, and selected 5 representative images to show the restoration results of 6 different algorithms, as shown in Fig. 11. From the results, it can be seen that the degradation degree of the given original image gradually increases from left to right. For this case, the restoration effect of AF-ECCL also shows a trend of getting worse and worse; although the restoration result of CycleGAN is better overall, there are more colorful noise patches in the background; CID-GAN and SwinTransformer have similar results, and both have more small discontinuous strokes; pix2pix and the proposed method have similar results, and can generally restore text under different degradation conditions well, which is also verified in the quantitative experimental results later.

Fig. 11: Experimental results of one character with varying degrees of degradation.
figure 11

Comparison of experimental results between the proposed method and five other methods for the same character in different degradation states.

Quantitative evaluation and comparison

To provide a more objective comparison of the generation results from different methods, this study uses three quantitative metrics: PSNR, SSIM, and ECCL Ratio, as shown in Table 1. The values in the table represent the average results across all test data. In this context, higher PSNR and SSIM values indicate better performance, while a lower ECCL Ratio is preferred.

Table 1 Quantitative evaluation results

From the results in the table, it is evident that the proposed method outperforms all others in terms of PSNR, SSIM, and ECCL Ratio, which aligns with the qualitative results observed earlier. Additionally, it can be noted that the AF-ECCL method, due to its reliance on traditional image processing techniques, performs the worst across all three metrics. CycleGAN and CID-GAN yield very similar results, as these methods share the most similarities in their underlying models. Pix2pix performs better than the other three methods, but still slightly falls short compared to the method proposed in this paper.

The results mentioned above represent the average values across all images. However, the average may not fully reflect the variability of a method’s performance across different test images. To further examine the differences in the results for each test image, the study also presents, as shown in Figs. 1214, the evaluation results for each test image under each of the three metrics.

Fig. 12: PSNR results.
figure 12

The PSNR experimental results of this method and five other different methods.

Fig. 13: SSIM results.
figure 13

The SSIM experimental results of this method and five other different methods.

Fig. 14: ECCL results.
figure 14

The ECCL experimental results of this method and five other different methods.

From the result curves, it can be observed that the curves for PSNR and SSIM of the proposed method (in red) are generally above those of the other methods, except for a few individual images, indicating better overall performance. Similarly, for the ECCL Ratio, the curve of the proposed method (in red) is below those of the other methods, except for two images, which also reflect generally better results.

Ablation studies

In order to further verify the effectiveness of the proposed ECCL constraint, this paper further designed ablation experiments, as shown in Table 2. The first group of experiments is the complete method of this paper, that is, the ECCL constraint is included in the Loss function of the generator, while the second group is based on the method of this paper, removing the ECCL constraint, that is, in the Loss function of the generator, only the classic GAN generator loss and L1 loss are included. From the experimental results, it can be seen that after removing the ECCL constraint, both PSNR and SSIM have dropped significantly, and the results are close to those of CID-GAN and SwinTransformer, indicating that the ECCL constraint does play a positive role in the generation of bamboo slips.

Table 2 Ablation results with and without ECCL in the proposed method

Discussion

In this paper, to address the suboptimal denoising and restoration of Qin-Han bamboo slip characters, a Qin-Han bamboo slip dataset suitable for automatic text recognition was constructed with the assistance of ancient text experts. Based on this dataset, an improved Conditional Generative Adversarial Network (cGAN) model with Effective Character Contour Length (ECCL) constraint was proposed to effectively restore bamboo slip characters with complex backgrounds. Through experimental comparisons with AF-ECCL, CycleGAN, CID-GAN, Swin Transformer and pix2pix, the proposed method achieved an average PSNR of 12.8, an average SSIM of 0.70, and an average ECCL Ratio of 0.22. These three metrics were the best among all methods, validating the effectiveness of the proposed approach.

Although the proposed method has achieved good restoration results in the case of existing bamboo slips character datasets, there are still some limitations. (1) Due to the limited number of data set samples, the generalization performance of the model needs to be improved. The production of bamboo slips character datasets is different from ordinary image datasets. It is a tedious and highly professional task. Therefore, further expanding the dataset is the focus of the next step. (2) This method is specially designed for bamboo slips text, so it may not have a good effect on the restoration of other non-Chinese ancient characters.