Unsupervised GAN-based model for super-resolution of oracle bone rubbing images

Wang, Shibin; Yu, Qi; Wang, Yu; Liu, Dong; Li, Xueshan

doi:10.1038/s40494-025-02154-3

Download PDF

Article
Open access
Published: 18 November 2025

Unsupervised GAN-based model for super-resolution of oracle bone rubbing images

Shibin Wang^1,2,
Qi Yu^1,3,
Yu Wang^1,3,
Dong Liu^1,2 &
…
Xueshan Li³

npj Heritage Science volume 13, Article number: 584 (2025) Cite this article

860 Accesses
Metrics details

Abstract

Rubbing images of oracle bones are crucial for studying ancient Chinese culture, but their quality often hinders character recognition and analysis. Current super-resolution approaches face notable limitations when applied to oracle bone rubbing images, particularly in preserving the integrity of genuine character structures, and often introduce character distortions and visual artifacts. To address this issue, we propose an unsupervised super-resolution method for oracle bone rubbing images, named OBISR. The model is based on a generative adversarial network and introduces a two-stage reconstruction generator built upon Transformer architecture, along with a more stable generator variant obtained via the Exponential Moving Average (EMA) mechanism. Meanwhile, an artifact loss function is employed to suppress artifacts and distortions in character regions, and a set of local discriminators is designed to enhance detail perception. Experiments show OBISR outperforms existing methods across quantitative metrics and visual evaluations, offering superior enhancement for oracle bone rubbing images.

Generating oracle bone inscriptions based on the structure aware diffusion model

Article Open access 18 September 2025

OraGAN: a deep learning based model for restoring Oracle Bone Script Images

Article Open access 24 December 2025

Clustering-based feature representation learning for Oracle Bone Inscriptions detection

Article Open access 25 June 2025

Introduction

Oracle bone inscriptions, as the earliest known mature writing system in China, provide irreplaceable primary evidence for studying the history, culture, and script evolution of the Shang and Zhou dynasties¹. These inscriptions have largely been preserved in the form of ink rubbings, which constitute the main corpus for contemporary research. However, due to the considerable age of these artefacts, rubbing images often suffer from significant degradation, such as ink bleeding, surface damage, and character blurring, which severely limits their usability in digital archiving, intelligent recognition, and in-depth scholarly analysis. Enhancing their clarity is therefore crucial for restoring structural details and stylistic features, improving recognition accuracy, and advancing the digital preservation and intelligent study of oracle bone inscriptions.

Manual restoration methods, such as hand-tracing, depend heavily on the subjective judgment of experts, are time-consuming, and often struggle to balance fidelity with stylistic consistency. In recent years, super-resolution (SR) techniques, which aim to reconstruct high-resolution (HR) images from low-resolution (LR) inputs, have demonstrated substantial potential in domains such as remote sensing^2,3,4, medical imaging^5,6,7, and cultural heritage conservation^8,9,10. Yet, in practice, these oracle bone rubbing images often contain substantial noise, which can cause models to learn noise patterns rather than meaningful features. Moreover, oracle bone datasets typically contain only LR images, with no paired HR counterparts, which restricts the applicability of conventional supervised SR approaches^{11,12,13,14,15}.

To overcome the paired-data limitation, research on real-world SR for unpaired data can be broadly divided into two categories: unpaired SR based on degradation modelling and complex-degradation SR using only HR data. The first category leverages independently collected HR and LR images to explicitly model the degradation process. Inspired by unpaired image-to-image translation methods such as CycleGAN¹⁶, early works used adversarial learning with cycle consistency to bridge the domain gap. Building on this, SCGAN¹⁷ introduced a semi-cyclic architecture with a dedicated degradation network to generate pseudo-LR images, thereby improving reconstruction consistency^18,19,20,21. Wang et al.²² further developed a method leveraging a compact proxy space for unpaired image matching in SR. Other approaches^23,24,25 trained stand-alone degradation models to synthesize realistic LR data for supervised training in the absence of genuine HR-LR pairs. The second category bypasses the need for LR data entirely by synthesizing degraded images directly from HR inputs using handcrafted or statistically defined degradation pipelines^{9,26,27,28,29}. For example, HiFaceGAN²⁶ employs a multi-stage adaptive SR architecture for perceptual enhancement; Real-ESRGAN²⁷ uses a tailored degradation pipeline with GAN training for realistic artifacts; MM-RealSR²⁸ introduces interactive modulation for controllable restoration; WFEN²⁹ applies wavelet-based frequency fusion to reduce distortion; ConvSRGAN⁹ cascades enhanced residual modules to capture multi-scale textures. While these approaches offer valuable strategies for handling unpaired data and complex degradations, they remain suboptimal for oracle bone rubbings, particularly in suppressing noise and preserving the fine-grained texture of ancient characters. They also often introduce artifacts or structural distortions in character regions, thereby degrading recognition and analytical performance. These limitations motivate the development of domain-specific solutions.

To address the aforementioned challenges, we propose a super-resolution method specifically designed for oracle bone friction images, called OBISR, which effectively tackles the practical limitation of lacking paired HR and LR samples in image reconstruction methods. We introduce a two-stage reconstruction generator DTG based on the Transformer architecture³⁰, and further develop an enhanced variant DTG_EMA to improve the stability of the generation process by incorporating the EMA technique. To more accurately restore the finer structural details of oracle bone inscriptions, a custom local discriminator is devised, specifically tailored to the varying sizes and textural characteristics of rubbing images. In addition, an artifact loss function is introduced³¹ to measure the discrepancy between the outputs of DTG and DTG_EMA and the input LR images. Comparison with state-of-the-art super-resolution methods demonstrates that our OBISR achieves significantly superior performance in oracle bone image restoration. It not only outperforms existing approaches on quantitative evaluation metrics including PSNR, SSIM³² and LPIPS³³, while also delivering clearer and more structurally accurate visual results.

The main contributions of this work are as follows:

We propose OBISR, a super-resolution approach specifically designed for oracle bone rubbings, which effectively circumvents the need for paired training data while preserving intricate inscription features.
A two-stage Transformer-based generator is introduced, along with an enhanced variant developed using the EMA technique. Additionally, a custom local discriminator is designed to more accurately capture the structural and textural variations of rubbing images.
An artifact-aware loss function is further introduced, which suppresses character artifacts and distortions by comparing the outputs of the two generators with HR reference images.
Extensive experiments demonstrate that OBISR significantly outperforms existing state-of-the-art methods in oracle bone image restoration, achieving superior performance in both quantitative evaluation metrics and visual quality.

Methods

Model architecture

Inspired by the SCGAN¹⁷ framework, we propose a novel super-resolution model for oracle bone rubbing images, named OBISR. As illustrated in Fig. 1, the model adopts a dual half-cycle architecture, where two LR degradation generators (G_HL and G_SL), two super-resolution reconstruction generators (DTG and DTG_EMA), and four discriminators (D_P1, D_P2, D_P3, D_P4) are jointly integrated into a multi-branch adversarial optimization framework.

**Fig. 1: Network architecture of our proposed method.**

In the first half-cycle, OBISR takes a HR oracle bone rubbing image I_HR as the input to the degradation generator G_HL. By incorporating a random noise vector z, G_HL simulates real-world degradation patterns and generates a 4 × downsampled LR image I_SL. This degraded image I_SL is then fed into both reconstruction generators DTG and DTG_EMA, resulting in two reconstructed HR images I_SS and I_SSEMA, respectively. To suppress artifacts and distortions around character regions, a hallucination-aware loss ${{\mathcal{L}}}_{{\rm{M}}}$ is computed based on the differences among I_HR, I_SS, and I_SSEMA.

In the second half-cycle, a real LR oracle bone image I_LR is input into the DTG generator, producing a reconstructed HR image I_SR. This image is subsequently passed through the degradation generator G_SL to yield a synthetic LR image I_SSL.

These two half-cycle reconstruction sub-networks are connected via the DTG reconstruction generator, which helps mitigate the adverse effects caused by the domain gap between synthetic LR rubbing images and real LR rubbing images.

Low-resolution degradation generator

Both our degradation generators G_HL and G_SL adopt the same LR generator architecture. As shown in Fig. 2, the generator consists of an encoder-decoder structure. The encoder begins with a spectral normalization³⁴ (SN) layer, followed by a 3 × 3 convolutional layer, a global average pooling³⁵ (GAP) layer and six residual blocks. These residual blocks are responsible for extracting meaningful and representative features from the HR input. The decoder also includes six residual blocks. After the 2nd and 4th residual blocks, pixel unshuffling operations are applied to increase the spatial resolution of the feature maps by a factor of 2. Finally, the output is activated by either a ReLU³⁶ or Tanh function, resulting in a degraded LR oracle bone rubbing image.

This design enables the generator to simulate realistic degradation patterns observed in ancient oracle bone images.

Super-resolution reconstruction generator

As shown in Fig. 3, the proposed reconstruction generator is specifically designed for the oracle rubbing image super-resolution task. It consists of two modules: the DTG and the DTG_EMA. These two generator modules share the same architecture as the super-resolution reconstruction generator. The main difference lies in the application of the EMA strategy to the DTG_EMA generator’s parameters, resulting in more consistent and stable performance.

EMA assigns greater weight to recent observations in order to smooth time-series data, effectively reducing short-term fluctuations and highlighting long-term trends. During training, the parameters of the DTG generator are dynamically updated using EMA to obtain a smoothed version, denoted as DTG_EMA. Specifically, EMA performs an exponentially weighted average over the parameter trajectory of the generator, which significantly suppresses noise and parameter oscillations, thereby improving training stability. The update rule of EMA is defined as:

$${\theta }_{t}^{DT{G}_{EMA}}=\alpha {\theta }_{t}^{DTG}+\left(1-\alpha \right){\theta }_{t-1}^{DT{G}_{EMA}},$$

(1)

Compared to the DTG generator, DTG_EMA demonstrates greater robustness in reducing the occurrence of random artifacts and improving image fidelity. Since DTG_EMA is a smoothed variant of DTG’s parameters, it shares the same loss function. The outputs of both generators are fed into the artifact map module M_refine³¹ to compute the artifact loss ${{\mathcal{L}}}_{{\rm{M}}}$, which further alleviates the loss of fine details during optimization. In the inference phase, both generators are executed separately, producing two distinct outputs. This dual-generator inference strategy allows for a comprehensive evaluation of each generator’s performance.

To effectively extract both global structures and local textures, the input I_LR is first upsampled to 128 × 128. Then, a Laplacian decomposition module is applied to perform frequency separation, resulting in a low-frequency component I_{LR_LF} of size 64 × 64 and a high-frequency component I_{LR_HF} of size 128 × 128. As a near-lossless transformation, the Laplacian decomposition preserves structural integrity during this process. Specifically, the low-frequency component captures global contours and stroke layouts, while the high-frequency component retains fine textures and edges. This separation allows the network to process complementary information in a targeted manner. As shown in Fig. 4, both components clearly preserve their respective characteristics, which are later fused to produce structurally coherent and detail-rich reconstructions.

**Fig. 4: Input and outputs of the Laplacian decomposition.**

Subsequently, the low-frequency component is processed by a structure enhancement block to enhance the global representation and produce an initial super-resolved image ${I}_{{\text{SR}}^{* }}$. Then, the high-frequency component I_{LR_HF} and ${I}_{{\text{SR}}^{* }}$ are jointly fed into a recurrent detail supplement, which progressively injects high-frequency details in a recursive manner.

Finally, a convolutional fusion module combines and refines the dual-path features to generate the final HR output I_SR. By leveraging frequency separation and dual-branch design, this generator effectively enhances both structural restoration and detail fidelity, achieving high-quality super-resolution reconstruction for oracle bone rubbing images.

Discriminator

We design a discriminator architecture based on PatchGAN³⁷, which judges the authenticity of local regions within the image instead of evaluating the entire image globally. Specifically, the discriminator divides the input image into multiple N × N receptive field patches and outputs the real/fake probability of each patch, enabling more precise perception of artifacts and local details. To accommodate different resolution inputs, as shown in Fig. 5, we design different network depths for 64 × 64 and 16 × 16 images, respectively.

For 64 × 64 input images, the discriminator consists of five feature extraction modules and one output layer: The first layer is a convolutional layer with a stride of 2, followed by a LeakyReLU activation function; The second to fourth layers adopt a structure of convolution → Batch Normalization → LeakyReLU, with the number of channels set to 128, 256 and 512, respectively; The fifth layer maintains 512 channels and uses a stride-1 convolution to preserve spatial resolution; The output layer uses a 4 × 4 convolutional kernel to produce a one-channel discrimination map, and a Sigmoid function is applied to obtain the realness probability of each local region.

For 16 × 16 input images, the network is relatively shallower and contains four feature extraction modules and one output layer: The first four layers share the same structure as in the 64 × 64 case, with the number of channels set to 64, 128, 256, and 512, respectively; The final layer uses a 2 × 2 convolutional kernel to output the discrimination map, followed by a Sigmoid function to get the probability values.

In implementation, we use 4 × 4 convolutional kernels as basic building blocks, apply stride-2 convolutions for spatial downsampling, and use stride-1 in the last two layers to maintain receptive field precision. LeakyReLU is used as the activation function, and Batch Normalization is adopted in all layers except the first.

The entire discriminator generates a “discrimination map” composed of outputs from multiple receptive field areas, thereby enhancing its ability to recognize local structures and artifacts in the image and improving both discrimination performance and stability.

G _HL generator loss

This module takes HR images and a random noise vector Z as input, aiming to generate LR oracle rubbing images with realistic degradation characteristics. To approximate the degradation found in real LR oracle rubbing images, we combine adversarial loss³⁸ and pixel loss to formulate the loss function for the G_HL degradation generator. The loss is defined as:

$${{\mathcal{L}}}_{{G}_{HL}}=\alpha {{\mathcal{L}}}_{pix}^{{I}_{SL}}+\beta {{\mathcal{L}}}_{adv}^{{D}_{P1}},$$

(2)

where α and β are the weighting coefficients. The adversarial loss³⁸${{\mathcal{L}}}_{adv}^{{D}_{P1}}$ is defined as:

$$\begin{array}{lll}{{\mathcal{L}}}_{adv}^{{D}_{P1}}\;=\;{\mathbb{E}}\left[min\left(0,{D}_{P1}({I}_{LR})-1\right)\right]\\\qquad\quad\;\;\;+\,{\mathbb{E}}\left[min\left(0,-1-{D}_{P1}({I}_{SL})\right)\right],\end{array}$$

(3)

where the discriminator D_P1 is trained to classify the real LR rubbing image I_LR as 1 and the synthesized LR image I_SL as 0. Although the training data is unpaired, the proposed dual half-cycle architecture constructs pseudo-paired paths, thereby establishing an internally consistent learning framework. The pixel loss ${{\mathcal{L}}}_{pix}^{{I}_{SL}}$ is defined as the ℓ₁ distance between the synthesized LR image I_SL and the downsampled version of its corresponding input I_HR, where average pooling is used to match the resolution of I_SL. This design ensures internal consistency, as both I_SL and its optimization target are derived from the same HR image.

D T G generator loss

To distinguish between GAN-generated pseudo-textures and real image details, we introduce the artifact mapping M_refine³¹ to compute artifact loss, which helps regulate and stabilize the training process. The method seeks to learn an artifact probability map $M\in {{\mathbb{P}}}^{H\times W\times 1}$, indicating the likelihood of each pixel I_SR(i, j) being an artifact.

First, we compute the residual between the real HR image I_HR and the generated image I_SS from the DTG generator to extract high-frequency components:

$${\mathcal{R}}={I}_{HR}+{I}_{SS}.$$

(4)

Then, the local variance of the residual R is used to obtain the preliminary artifact map M:

$$\begin{array}{c}M\left(i,j\right)=var\left(R\left(i-\frac{n-1}{2}:i+\frac{n-1}{2},\right.\right.\\ \left.\left.j-\frac{n-1}{2}:j+\frac{n-1}{2}\right)\right),\end{array}$$

(5)

where var denotes the variance operator and n is the local window size. To prevent misclassification of local pixels, we further compute a stabilization coefficient σ from the residual R:

$$\sigma ={\left(var\left(R\right)\right)}^{\frac{1}{a}},$$

(6)

where a = 5. Residual maps R₁ and R₂ are then generated from the outputs of the DTG and DTG_EMA generators respectively:

$${R}_{1}={I}_{HR}-DTG\left({I}_{SL}\right),$$

(7)

$${R}_{2}={I}_{HR}-DT{G}_{EMA}\left({I}_{SL}\right).$$

(8)

Finally, the preliminary artifact map M is refined using both residuals R₁ and R₂ to enable more accurate localization of artifact-prone regions. This refinement does not assume that the EMA-based generator consistently produces fewer artifacts. Instead, it employs a pixel-wise comparison: penalization is applied only when ∣R₁(i, j)∣ ≥ ∣R₂(i, j)∣, suggesting that I_SS(generated by DTG) yields a higher reconstruction error at that location. When ∣R₁(i, j)∣ < ∣R₂(i, j)∣, it implies that I_SS is already close to the target value at that pixel, and therefore no additional penalty is applied. This selective strategy helps suppress artifacts while preserving high-frequency details.

$$\begin{array}{c}{M}_{{\rm{refine}}}(i,j)=\\ \left\{\begin{array}{l}0,\quad \,\text{if}\,| {R}_{1}(i,j)| < | {R}_{2}(i,j)| \quad \\ \sigma \cdot M(i,j),\quad \,\text{if}\,| {R}_{1}(i,j)| \ge | {R}_{2}(i,j)| .\quad \end{array}\right.\end{array}$$

(9)

As shown in Fig. 6, the last two columns visualize the mask of ∣R₁∣ < ∣R₂∣ and the refined artifact map M_refine, where well-preserved textures and sharp edges are effectively excluded from penalization.

**Fig. 6: Visualization of the Artifact Map Generation Process.**

The DTG generator is shared across the forward and backward cycle processes to reconstruct HR images from LR rubbings. In the forward cycle, adversarial loss and cycle-consistency loss are used; in the backward cycle, adversarial loss and pixel loss are adopted. The total loss for the DTG generator is defined as:

$${{\mathcal{L}}}_{DTG}=\theta {{\mathcal{L}}}_{DTG}^{{I}_{SS}}+\gamma {{\mathcal{L}}}_{DTG}^{{I}_{SR}},$$

(10)

where θ and γ are the corresponding weights, and the expressions of ${{\mathcal{L}}}_{DTG}^{{I}_{SS}}$ and ${{\mathcal{L}}}_{DTG}^{{I}_{SR}}$ are defined as follows:

$${{\mathcal{L}}}_{DTG}^{{I}_{SS}}=\alpha \left({{\mathcal{L}}}_{cyc}^{{I}_{SS}}+{{\mathcal{L}}}_{M}\right){\mathcal{+}}\beta {{\mathcal{L}}}_{adv}^{{D}_{P3}},$$

(11)

$${{\mathcal{L}}}_{DTG}^{{I}_{SR}}=\alpha {{\mathcal{L}}}_{pix}^{{I}_{SR}}+\beta {{\mathcal{L}}}_{adv}^{{D}_{P4}}.$$

(12)

Here, the adversarial losses³⁸${{\mathcal{L}}}_{adv}^{{D}_{P3}}$ and ${{\mathcal{L}}}_{adv}^{{D}_{P4}}$ are defined as:

$$\begin{array}{lll}{{\mathcal{L}}}_{adv}^{{D}_{P3}}\;=\;{\mathbb{E}}\left[min\left(0,{D}_{P3}({I}_{HR})-1\right)\right]\\\qquad\quad\;\;\;+\,{\mathbb{E}}\left[min\left(0,-1-{D}_{P3}({I}_{SS})\right)\right],\end{array}$$

(13)

$$\begin{array}{lll}{{\mathcal{L}}}_{adv}^{{D}_{P4}}\;=\;{\mathbb{E}}\left[min\left(0,{D}_{P4}({I}_{HR})-1\right)\right]\\\qquad\quad\;\;\;+\,{\mathbb{E}}\left[min\left(0,-1-{D}_{P4}({I}_{SR})\right)\right].\end{array}$$

(14)

The cycle-consistency loss ${{\mathcal{L}}}_{cyc}^{{I}_{SS}}$¹⁶ ensures the generator can preserve semantic information and accurately restore character details. The pixel loss ${{\mathcal{L}}}_{pix}^{{I}_{SR}}$ is computed between the recovered SR image I_SR and the upsampled version of its corresponding input I_LR, where upsampling is performed via bicubic interpolation. Although the training data is unpaired, both I_SR and the optimization target originate from the same LR image, ensuring consistency within the LR → HR reconstruction path.

The artifact loss ${{\mathcal{L}}}_{{\rm{M}}}$ is derived from the artifact map M_refine³¹, which is expressed as:

$${{\mathcal{L}}}_{M}={M}_{refine}.$$

(15)

D T G _EMA generator loss

The EMA technique performs temporal smoothing on the parameters of the DTG generator to produce a more stable generator, DTG_EMA. Compared with the DTG generator, the DTG_EMA generator is more robust in suppressing stochastic artifacts, thereby improving the quality of generated images.

Since DTG_EMA is optimized from DTG via parameter smoothing, it adopts the same loss function as the DTG generator. The output images from the generator are passed into the artifact mapping M_refine³¹ to compute artifact loss, which further reduces the loss of real details during optimization. For a detailed description of the loss design for the DTG generator, please refer to Section 3.4.2 of this paper.

G _SL generator loss

By learning DTG, the rubbing image I_SR degraded into a LR rubbing image I_SSL. The loss function of G_SL is defined as:

$${{\mathcal{L}}}_{{G}_{SL}}=\alpha {{\mathcal{L}}}_{cyc}^{{I}_{SSL}}+\beta {{\mathcal{L}}}_{adv}^{{D}_{P2}}.$$

(16)

The adversarial loss³⁸${{\mathcal{L}}}_{adv}^{{D}_{P2}}$ is defined as:

$$\begin{array}{lll}{{\mathcal{L}}}_{adv}^{{D}_{P2}}\;=\;{\mathbb{E}}\left[min\left(0,{D}_{P2}({I}_{HR})-1\right)\right]\\\qquad\quad\;\;\;+\,{\mathbb{E}}\left[min\left(0,-1-{D}_{P2}({I}_{SSL})\right)\right],\end{array}$$

(17)

where the discriminator D_P2 is trained to predict 1 for the real LR rubbing image I_LR and 0 for the synthesized LR rubbing image I_SSL.

The cycle-consistency loss¹⁶${{\mathcal{L}}}_{{I}_{SSL}}^{cyc}$ is an ℓ₁ loss function that penalizes the difference between the degraded image I_SSL and the real LR image I_LR.

Results

Experimental settings

Our model is implemented based on PyTorch³⁹ version 1.11.0, using Python 3.7 and an NVIDIA L40S GPU. The optimizer is Adam⁴⁰ with parameters β₁ = 0.9 and β₂ = 0.999. The initial learning rate is set to 1 × 10⁻⁴ and decays to 1 × 10⁻⁵ every 10 epochs via a cosine annealing schedule. The batch size is set to 64.

The training and testing data used in our study are respectively sourced from two oracle bone image datasets: the OBC306 dataset⁴¹, released by Yinque Wenyuan, and the OBI41957 dataset, provided by the research team of the Key Laboratory of Oracle Bone Inscriptions Information Processing, Ministry of Education, Henan Province, China.

For training, we randomly select 3000 clean images from the OBI41957 dataset as HR reference images, and 3000 low-noise images from the OBC306 dataset as LR inputs.

To comprehensively assess model generalization, we construct three representative test sets:

OBI_BI (with noise): 100 clear images are randomly selected from the OBI41957 dataset, and LR images are generated via bicubic downsampling.
OBI_AR (without noise): 1000 hand-drawn oracle character images are selected from OBI41957, and LR images are produced through black-white inversion followed by region-based interpolation.
OBITrue (with real degradation): 1440 real oracle bone rubbing images are randomly sampled from the OBC306 dataset and used as low-resolution images with authentic degradation characteristics.

We employ both pixel-level and perceptual-level metrics to evaluate the quality of generated results. Specifically, we use PSNR, SSIM³², and VIF as fidelity-oriented metrics. Additionally, to approximate human visual perception, we incorporate two deep-feature-based perceptual similarity metrics: LPIPS³³ and DISTS.

Based on these five metrics, we construct a normalized radar chart and additionally propose a “radar area” metric as an intuitive indicator for the overall performance of each method. This metric reflects both the balance and effectiveness of the model across multiple dimensions, facilitating a holistic comparison among different approaches. Since LPIPS and DISTS are inverse metrics, we apply a transformation of 1 − normalized value during normalization to ensure that all metrics are mapped to the [0, 1] range, and follow the principle that higher values indicate better performance. Finally, the area of the pentagonal radar chart is calculated using the vector cross product method.

To evaluate the perceptual quality of super-resolved images on the OBITrue dataset, which lacks paired high-resolution ground truth, we conducted a Mean Opinion Score (MOS) evaluation. Given the poor suitability of general no-reference quality metrics (e.g., NIQE, BRISQUE) for assessing the visual characteristics of ancient script imagery, we performed a blind subjective assessment involving 16 participants comprising domain experts and graduate students specializing in oracle bone epigraphy. All participants possessed specialized knowledge in this field. They were asked to rate the visual outputs of different methods based on three key criteria: image clarity, structural consistency, and semantic interpretability. Ratings were collected using a five-point Likert scale (1 = Very Poor, 5 = Excellent). The final MOS for each method was obtained by averaging scores across all participants and test samples.

Comparative study

To comprehensively evaluate the performance of the proposed OBISR model, we conduct comparative experiments against several representative state-of-the-art methods, including Bicubic interpolation, CycleGAN¹⁶, LRGAN²³, HiFaceGAN²⁶, MM-RealSR²⁸, SCGAN¹⁷, WFEN²⁹ and NeXtSRGAN¹⁴. For a fair comparison, we retrain all comparative methods rigorously to ensure their optimal performance on the target datasets.

Table 1 presents a quantitative comparison between our proposed OBISR model and various state-of-the-art methods on the OBI_BI and OBI_AR datasets, while Fig. 7 presents a comprehensive performance comparison of different methods on these two datasets using radar charts.On both OBI_BI and OBI_AR datasets, OBISR-DTG_EMA achieved the largest radar chart area, demonstrating its superior comprehensive performance in structural reconstruction, detail restoration, and perceptual quality. Particularly on the OBI_AR dataset, OBISR-DTG_EMA attained a radar area of 2.1888, outperforming OBISR-DTG across all metrics. Furthermore, it demonstrated significant advantages in key metrics such as VIF and PSNR over mainstream methods including NeXtSRGAN¹⁴ and WFEN²⁹. These results demonstrate that our method exhibits enhanced generalization capability and robustness in the oracle bone rubbing super-resolution task.

Table 1 Quantitative results of comparative experiments on OBI_BI and OBI_AR datasets

Full size table

**Fig. 7: Normalized radar charts with integrated “radar area” metric.**

Figure 8 shows the qualitative visual results of different methods on the OBI_BI dataset. It can be observed that methods such as Bicubic, CycleGAN¹⁶, LRGAN²³, HiFaceGAN²⁶, and WFEN²⁹ exhibit significant defects in reconstructing stroke continuity and structural outlines. LRGAN²³, MM-RealSR²⁸, and SCGAN¹⁷ produce varying degrees of character distortion and even introduce non-existent noise. NeXtSRGAN¹⁴ is capable of generating generally clear character structures; however, it still exhibits certain shortcomings in stroke-level details, such as subtle gaps or unnatural protrusions, which compromise the overall structural fidelity. In contrast, the images generated by the DTG generator in our proposed OBISR method exhibit superior structural consistency and detail preservation, resulting in visual outputs that more closely resemble the ground truth high-resolution images.

**Fig. 8: Qualitative results of different methods on the OBI_BI dataset.**

Figure 9 further presents the qualitative results on the OBI_AR and OBITrue datasets. Since the OBITrue dataset consists of real-world LR oracle rubbings and is an unpaired dataset, the corresponding ground truth images are not provided. We observe that both Bicubic and HiFaceGAN²⁶ generate overly smooth and blurry results. CycleGAN¹⁶ suffers from stroke disconnection on OBI_AR and severe character deformation on OBITrue. Moreover, LRGAN²³ introduce varying levels of noise and character artifacts. MM-RealSR²⁸ tends to produce unnatural character structures, and its outputs on the OBI_AR dataset exhibit partially clear but largely blurred image results. Both SCGAN¹⁷ and NeXtSRGAN¹⁴ demonstrate substantial character shape distortion across both datasets. On the OBI_AR dataset, WFEN²⁹ produces visually competitive results. However, evaluations on the OBITrue dataset indicate that its generated images exhibit slight blurring and white noise artifacts, which negatively affect the overall visual perception. In contrast, the proposed OBISR model demonstrates more stable reconstruction performance across both datasets, achieving superior overall quality in terms of image clarity, stroke continuity, and structural integrity.

**Fig. 9: Qualitative results of different methods on the OBI_AR and OBITrue datasets.**

To further validate visual quality, Table 2 reports the Mean Opinion Scores (MOS) on the OBITrue dataset. It achieves superior mean MOS scores of 4.53 and 4.36 using the DTG and DTG_EMA generators, exceeding those of other methods. This experimental evidence verifies that OBISR successfully enhances image sharpness and maintains the inscriptions’ structural and semantic integrity, demonstrating high effectiveness and robustness for reconstructing real-world rubbing images.

Table 2 Mean Opinion Scores (MOS) of different methods on the OBITrue dataset

Full size table

Beyond comparative experiments, a series of ablation studies are conducted to further analyze the contribution of individual components in the proposed OBISR model, as described in the following subsections.

Impact of PatchD discriminator count on OBISR performance

To ensure the validity and isolation of each proposed module, this section and the subsequent ablation studies follow a unified experimental principle. The primary objective is to evaluate the individual and joint contributions of the key architectural components, including the DTG generator. To maintain consistency and ensure a fair comparison, all experimental variants are evaluated using a unified DTG generator, with the DTG module either activated or deactivated according to the specific configuration. When the DTG generator is disabled, the generator from SCGAN is employed to produce the inference results. The DTG_EMA generator is excluded from this analysis to avoid introducing confounding variables, since its performance has already been validated in the comparative study.

As clearly shown in Table 3, our proposed OBISR model, configured with four PatchD discriminators (D_P1, D_P2, D_P3 and D_P4), achieves the top performance in PSNR, SSIM, and VIF. Compared to any other variant, OBISR demonstrates global superiority, attaining comprehensive advantages in all three dimensions. This validates the strong synergy and generalization ability of our designed PatchD discriminator architecture.

Table 3 The quantitative evaluation results of OBISR and its 15 variants with different discriminator configurations on the OBI_BI test set

Full size table

Although certain variants come close to optimal performance in individual metrics, they generally fail to maintain consistent superiority across all metrics. In contrast, OBISR achieves top scores in all three metrics, indicating stable and overall optimal performance. This further corroborates the significant benefits of employing four PatchD discriminators in restoring image structures and details.

It is worth noting that, from the perspective of sub-module combinations, using individual PatchD discriminators or partial combinations may yield some improvements. However, their overall performance still lags behind that of OBISR. This suggests that localized improvements are insufficient for achieving comprehensive performance gains, and only the full integration of all four discriminators can fully unleash the model’s potential.

In summary, OBISR demonstrates outstanding performance in terms of image quality, structural consistency and visual fidelity, making it the optimal solution among all configurations.

Effect of discriminator depth on reconstruction performance

The use of 16 × 16 and 64 × 64 resolutions is based on the classical 4 × super-resolution setting, which aligns well with the characteristics of oracle bone rubbings, where character regions are generally small and localized. Accordingly, the discriminator depth in our framework is determined based on the spatial characteristics of the input resolution.

For 64 × 64 images, employing a deeper discriminator with six convolutional layers enables the network to capture finer local details and high-frequency artifacts more effectively. Increasing the depth beyond this point, however, would cause the feature maps to shrink below 1 × 1, making it infeasible to retain spatially localized discriminative capability.

For 16 × 16 images, further increasing depth beyond four would cause the feature maps to shrink excessively, leading to unstable training or ineffective convolution operations. To strike a balance between discriminative capacity and architectural feasibility, we thus adopt a six-layer PatchD discriminator for 64 × 64 inputs and a five-layer version for 16 × 16 inputs.

To evaluate the impact of discriminator depth on image quality, we conduct ablation studies across multiple depth configurations. As shown in Table 4, increasing the number of convolutional layers consistently improves PSNR, SSIM, and VIF scores across both the OBI_BI and OBI_AR datasets. These results validate the effectiveness of our resolution-aware depth design in enhancing reconstruction performance.

Table 4 Reconstruction performance comparison of different discriminator depth configurations in OBISR variants

Full size table

Comparison of upsampling and downsampling strategies in the reconstruction generator

To systematically evaluate the individual contributions of each component in our model, we conduct ablation studies by progressively removing specific modules and analyzing their impact. Given the significant role of the PatchD discriminators in enhancing discriminative capability, which may obscure the effects of other design factors, we choose to base our analysis on the simplified version OBISR-K, where PatchD discriminators are not included. This allows us to focus on the impact of different upsampling and downsampling strategies on reconstruction performance.

Specifically, we design three structural variants based on OBISR-K by adopting different combinations of bicubic and bilinear interpolation for upsampling and downsampling, resulting in four configurations in total. We then conduct quantitative evaluations on two datasets: OBI_BI and OBI_AR. The results are presented in Table 5.

Table 5 Reconstruction performance comparison of different upsampling and downsampling strategies in OBISR-K variants (without PatchD discriminators)

Full size table

Experimental results show that, on the OBI_BI dataset, the OBISR-K variant using bilinear for both upsampling and downsampling achieves the best performance in terms of PSNR, SSIM and VIF, indicating that OBISR-K is more effective in enhancing reconstruction quality under complex degradation with noise. On the OBI_AR dataset, OBISR-K also leads in PSNR (17.5109) and VIF (0.2421), demonstrating consistent and stable superiority.

In summary, although bicubic interpolation may retain structural information more effectively in certain scenarios, bilinear exhibits better overall reconstruction accuracy and visual fidelity, especially in severely degraded images affected by noise.

Performance comparison of structural module designs

Table 6 presents a comprehensive ablation study evaluating the individual and joint contributions of four core modules: EMA, ${{\mathcal{L}}}_{{\rm{M}}}$, DTG, and PatchD. Results on the OBI_BI dataset demonstrate that each module provides distinct improvements to image reconstruction quality. Specifically, EMA alone leads to consistent performance gains, while the artifact loss proves effective even in the absence of EMA, highlighting its independent ability to suppress visual artifacts. The DTG module enhances the recovery of structural details, and PatchD contributes positively when used individually.

Table 6 Ablation study of OBISR with extensive combinations of EMA, L_M, DTG, and PatchD

Full size table

Furthermore, combining these modules yields stronger performance than using them in isolation. In particular, PatchD demonstrates more significant improvements when paired with DTG, suggesting a strong synergy between structural guidance and local discrimination. The full OBISR model, which integrates all four components, achieves the best overall results on both OBI_BI and OBI_AR datasets. These findings confirm not only the effectiveness of each individual module but also the complementarity among them in improving visual fidelity and structural consistency.

Discussion

In this work, we propose OBISR, a generative adversarial network designed for super-resolution reconstruction of oracle rubbing images. Without relying on paired low- and high-resolution data, OBISR enhances structural and detail recovery through a Transformer-based two-stage generator DTG and its improved variant DTG_EMA. By incorporating the artifact loss ${{\mathcal{L}}}_{{\rm{M}}}$ and the PatchD discriminator, our model effectively suppresses artifacts and character distortions. Experimental results demonstrate that OBISR outperforms existing methods in both quantitative metrics and qualitative visual assessments, showcasing its superior performance in oracle rubbing image super-resolution tasks. Future research could further optimize the model architecture to enhance its adaptability to real-world degradation patterns, thereby better supporting the digital preservation and scholarly study of oracle bone inscriptions.

Data availability

Data is available at https://github.com/Qisisi/OBISR.git. Code are available from the corresponding author upon reasonable request.

References

Gao, J. & Liang, X. Distinguishing oracle variants based on the isomorphism and symmetry invariances of oracle-bone inscriptions. IEEE Access 8, 152258–152275 (2020).
Article Google Scholar
Wang, C. & Sun, W. Semantic guided large scale factor remote sensing image super-resolution with generative diffusion prior. ISPRS J. Photogramm. Remote Sens. 220, 125–138 (2025).
Article Google Scholar
Xiao, H., Chen, X., Luo, L. & Lin, C. A dual-path feature reuse multi-scale network for remote sensing image super-resolution. J. Supercomputing 81, 17 (2024).
Article Google Scholar
Zhang, K. et al. Csct: Channel-spatial coherent transformer for remote sensing image super-resolution. IEEE Trans. Geosci. Remote Sens. 63, 1–14 (2025).
Article Google Scholar
Ma, J., Jian, G. & Chen, J. Diffusion model-based mri super-resolution synthesis. Int. J. Imaging Syst. Technol. 35, e70021 (2025).
Article Google Scholar
Wei, J., Yang, G., Wei, W., Liu, A. & Chen, X. Multi-contrast mri arbitrary-scale super-resolution via dynamic implicit network. IEEE Transactions on Circuits and Systems for Video Technology 1–1 (2025).
Hu, J., Qin, Y., Wang, H. & Han, J. Ms2cam: Multi-scale self-cross-attention mechanism-based mri super-resolution. Displays 88, 103033 (2025).
Article Google Scholar
Cao, J., Jia, Y., Yan, M. & Tian, X. Superresolution reconstruction method for ancient murals based on the stable enhanced generative adversarial network. EURASIP J. Image Video Process. 2021, 1–23 (2021).
Article CAS Google Scholar
Hu, Q. et al. Convsrgan: super-resolution inpainting of traditional chinese paintings. Herit. Sci. 12, 176 (2024).
Article Google Scholar
Xiao, C., Chen, Y., Sun, C., You, L. & Li, R. Am-esrgan: Super-resolution reconstruction of ancient murals based on attention mechanism and multi-level residual network. Electronics13 (2024).
Ledig, C. et al. Photo-realistic single image super-resolution using a generative adversarial network. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 105–114 (2017).
Wang, X. et al. Esrgan: Enhanced super-resolution generative adversarial networks. In Leal-Taixé, L. & Roth, S. (eds.) Computer Vision – ECCV 2018 Workshops, 63–79 (Springer International Publishing, Cham, 2019).
Li, L., Zhang, Y., Yuan, L. & Gao, X. Sanet: Face super-resolution based on self-similarity prior and attention integration. Pattern Recognit. 157, 110854 (2025).
Article Google Scholar
Park, S.-W., Jung, S.-H. & Sim, C.-B. Nextsrgan: enhancing super-resolution gan with convnext discriminator for superior realism. Vis. Computer 41, 7141–7167 (2025).
Article Google Scholar
Li, F. et al. Srconvnet: A transformer-style convnet for lightweight image super-resolution. Int. J. Computer Vis. 133, 173–189 (2025).
Article Google Scholar
Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In 2017 IEEE International Conference on Computer Vision (ICCV), 2242–2251 (2017).
Hou, H. et al. Semi-cycled generative adversarial networks for real-world face super-resolution. IEEE Trans. Image Process. 32, 1184–1199 (2023).
Article PubMed Google Scholar
Wu, C. & Jing, Y. Unsupervised super resolution using dual contrastive learning. Neurocomputing 630, 129649 (2025).
Article Google Scholar
Wang, W., Zhang, H., Yuan, Z. & Wang, C. Unsupervised real-world super-resolution: A domain adaptation perspective. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 4298–4307 (2021).
Maeda, S. Unpaired image super-resolution using pseudo-supervision. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 288–297 (2020).
Chen, S. et al. Unsupervised image super-resolution with an indirect supervised path. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 1924–1933 (2020).
Wang, L., Li, J., Wang, Y., Hu, Q. & Guo, Y. Learning coupled dictionaries from unpaired data for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 25712–25721 (2024).
Bulat, A., Yang, J. & Tzimiropoulos, G. To learn image super-resolution, use a gan to learn how to do image degradation first. In Ferrari, V., Hebert, M., Sminchisescu, C. & Weiss, Y. (eds.) Computer Vision – ECCV 2018, 187–202 (Springer International Publishing, Cham, 2018).
Lugmayr, A., Danelljan, M. & Timofte, R. Unsupervised learning for real-world super-resolution. In 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), 3408–3416 (2019).
Lee, S., Ahn, S. & Yoon, K. Learning multiple probabilistic degradation generators for unsupervised real world image super resolution. In Karlinsky, L., Michaeli, T. & Nishino, K. (eds.) Computer Vision – ECCV 2022 Workshops, 88–100 (Springer Nature Switzerland, Cham, 2023).
Yang, L. et al. Hifacegan: Face renovation via collaborative suppression and replenishment. In Proceedings of the 28th ACM International Conference on Multimedia, MM ’20, 1551-1560 https://doi.org/10.1145/3394171.3413965 (Association for Computing Machinery, New York, NY, USA, 2020).
Wang, X., Xie, L., Dong, C. & Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW), 1905–1914 (2021).
Mou, C. et al. Metric learning based interactive modulation for real-world super-resolution. In Avidan, S., Brostow, G., Cissé, M., Farinella, G. M. & Hassner, T. (eds.) Computer Vision – ECCV 2022, 723–740 (Springer Nature Switzerland, Cham, 2022).
Li, W. et al. Efficient face super-resolution via wavelet-based feature enhancement network. In Proceedings of the 32nd ACM International Conference on Multimedia, MM ’24, 4515-4523 https://doi.org/10.1145/3664647.3681088 (Association for Computing Machinery, New York, NY, USA, 2024).
Li, Y. et al. Swin-unit: Transformer-based gan for high-resolution unpaired image translation. In Proceedings of the 31st ACM International Conference on Multimedia, MM ’23, 4657-4665 https://doi.org/10.1145/3581783.3612518. (Association for Computing Machinery, New York, NY, USA, 2023).
Liang, J., Zeng, H. & Zhang, L. Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 5657–5666 (2022).
Wang, Z., Bovik, A., Sheikh, H. & Simoncelli, E. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612 (2004).
Article PubMed Google Scholar
Zhang, R., Isola, P., Efros, A. A., Shechtman, E. & Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2018).
Miyato, T., Kataoka, T., Koyama, M. & Yoshida, Y. Spectral normalization for generative adversarial networks. In International Conference on Learning Representations https://openreview.net/forum?id=B1QRgziT- (2018).
Lin, M., Chen, Q. & Yan, S. Network in network https://arxiv.org/abs/1312.4400 (2014).
Glorot, X., Bordes, A. & Bengio, Y. Deep sparse rectifier neural networks. J. Mach. Learn. Res. 15, 315–323 (2011).
Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 5967–5976 (2017).
Goodfellow, I. J. et al. Generative adversarial nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2, NIPS’14, 2672-2680 (MIT Press, Cambridge, MA, USA, 2014).
Paszke, A. et al. Pytorch: An imperative style, high-performance deep learning library https://arxiv.org/abs/1912.01703 (2019).
Kingma, D. P. & Ba, J. Adam: A method for stochastic optimization https://arxiv.org/abs/1412.6980 (2017).
Huang, S., Wang, H., Liu, Y., Shi, X. & Jin, L. Obc306: A large-scale oracle bone character recognition dataset. In 2019 International Conference on Document Analysis and Recognition (ICDAR), 681–688 (2019).

Download references

Acknowledgements

This work was supported by the Key Research Project of Higher Education Institutions in Henan Province (Grant No. 24A520018) and the National Natural Science Foundation of China (Grant No. 62072160).

Author information

Authors and Affiliations

School of Computer and Information Engineering, Henan Normal University, Xinxiang, 453007, Henan Province, China
Shibin Wang, Qi Yu, Yu Wang & Dong Liu
Key Laboratory of Artificial Intelligence and Personalized Learning in Education of Henan Province, Henan Normal University, Xinxiang, 453007, Henan Province, China
Shibin Wang & Dong Liu
Oracle Bone Intelligent Computing Laboratory, Henan Normal University, Xinxiang, 453007, Henan Province, China
Qi Yu, Yu Wang & Xueshan Li

Authors

Shibin Wang
View author publications
Search author on:PubMed Google Scholar
Qi Yu
View author publications
Search author on:PubMed Google Scholar
Yu Wang
View author publications
Search author on:PubMed Google Scholar
Dong Liu
View author publications
Search author on:PubMed Google Scholar
Xueshan Li
View author publications
Search author on:PubMed Google Scholar

Contributions

S.W. conceived the research idea, designed the overall methodology, and co-wrote the main manuscript text. Q.Y. developed the OBISR framework, performed the experiments, and co-wrote the main manuscript text. Y.W. collected and preprocessed the digital dataset of oracle bone rubbings. D.L. conducted data analysis. X.L. contributed to result interpretation. S.W., Q.Y., and Y.W. revised the manuscript critically for important intellectual content. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Shibin Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, S., Yu, Q., Wang, Y. et al. Unsupervised GAN-based model for super-resolution of oracle bone rubbing images. npj Herit. Sci. 13, 584 (2025). https://doi.org/10.1038/s40494-025-02154-3

Download citation

Received: 06 August 2025
Accepted: 03 November 2025
Published: 18 November 2025
Version of record: 18 November 2025
DOI: https://doi.org/10.1038/s40494-025-02154-3

Unsupervised GAN-based model for super-resolution of oracle bone rubbing images

Abstract

Similar content being viewed by others

Generating oracle bone inscriptions based on the structure aware diffusion model

OraGAN: a deep learning based model for restoring Oracle Bone Script Images

Clustering-based feature representation learning for Oracle Bone Inscriptions detection

Introduction