GuidePaint: lossless image-guided diffusion model for ancient mural image restoration

Hu, Jialv; Yu, Ying; Zhou, Qixue

doi:10.1038/s40494-025-01693-z

Download PDF

Article
Open access
Published: 22 April 2025

GuidePaint: lossless image-guided diffusion model for ancient mural image restoration

Jialv Hu¹,
Ying Yu¹ &
Qixue Zhou¹

npj Heritage Science volume 13, Article number: 118 (2025) Cite this article

2879 Accesses
9 Citations
Metrics details

Abstract

Ancient murals, vital cultural heritage, suffer from damage due to natural erosion and human activities. Traditional restoration methods, relying on manual repair, have limitations, making virtual restoration an innovative solution. This paper proposes a virtual restoration method based on diffusion model. Using a lossless image-guided algorithm, we adapt diffusion model designed for image synthesis to restoration. Instead of feeding damaged images into the network, we use them to adjust the network’s outputs directly, achieving unsupervised training. We also use random seeds to generate diverse outputs from a single image. Proposed similarity function ensures alignment of undamaged areas with the guiding image, and an interrupt sampling strategy removes subtle, dense degradations. Experiments on simulated and real damaged murals show our method yields results comparable to or better than other advanced methods for simple cases. For complex and severely damaged murals, it excels, outperforming others in both objective and subjective evaluations.

All-in-one mural restoration with prompt-guided residual diffusion

Article Open access 18 December 2025

Diffusion-based heterogeneous network for ancient mural restoration

Article Open access 22 May 2025

A comprehensive dataset for digital restoration of Dunhuang murals

Article Open access 02 September 2024

Introduction

From the first spark that ignited the flames of civilization in ancient times to the industrial machines that reshape the world today, human has had a magnificent journey. Reflecting on the past, we possess precious treasures—chief among them is our history. Murals, as one of the earliest means of recording information, carry significant historical and cultural value. However, due to natural erosion and human activities, many ancient murals have suffered from issues such as cracks, flaking, mold, and fading. As a non-renewable resource, the protection and restoration of these murals have become urgent tasks.

Traditional mural restoration primarily relies on manual, physical methods, requiring specialists with comprehensive knowledge of history, humanities, arts, and archeology. Moreover, these methods may cause irreversible damage to the murals. As an alternative, the digital restoration of mural images has emerged as a novel approach in the field of ancient mural preservation. Without affecting the original artwork, this method uses computer technology to virtually reconstruct the murals. The digitally restored mural images can not only serve as references for physical restoration but also help build a replicable database, offering a more reliable means for the preservation and transmission of these cultural artifacts.

Using computer technology to restore digital images of ancient murals aims to virtually reconstruct the missing or degraded areas as accurately as possible. In recent years, many outstanding image restoration techniques have emerged, which can generally be categorized into two types: conventional image restoration methods and deep learning-based methods.

Conventional image restoration methods are primarily based on two techniques: patch-based and diffusion-based (not to be confused with the diffusion model discussed in this paper). The core idea behind patch-based methods is to find image patches similar to the damaged areas and use them to replace the missing regions. Yang et al.¹ proposed a method for restoring complex damaged areas in the Dunhuang murals using Dempster-Shafer evidence theory and its data fusion. This algorithm outperforms the Criminisi² algorithm and its improvements in visual quality, though it requires more processing time. Li et al.³ introduced an automatic recognition and virtual restoration method for mud-spot damage in Tang dynasty tomb murals. This method automatically analyzes images to obtain mud-spot area masks, which are then used in a patch-based approach for restoration. Jiao et al.⁴ proposed an improved patch-based algorithm to restore the Wutai Mountain murals. Compared to other methods, their algorithm shows better restoration results for textures, edges, and smooth areas, though it depends on the relative position and connectivity of the mask regions. Wang et al.⁵ tackled the issue of large-scale structural damage in the murals of Yulin Caves and Mogao Caves in Gansu. They proposed using manually drawn lines to assist with restoration and combining global and local feature weighting based on structural guidance for restoration. The method surpasses other restoration methods for the targeted murals but heavily relies on artist guidance and line drawings. Cao et al.⁶ developed an algorithm for restoring ancient temple murals that employs a local search algorithm of an adaptive sample block. Compared to the Criminisi algorithm, the approach improves both the efficiency and effectiveness of restoration.

Diffusion-based methods fill missing areas by simulating the spread of pixels within an image, typically using gradient information and local features to guide the restoration process. Shen et al.⁷ proposed a mural restoration algorithm based on Morphological Component Analysis (MCA), inspired by the manual restoration approach for Tang dynasty tomb murals, where the global structures are addressed first, followed by local textures. The algorithm shows significant results in restoring cracks in Tang tomb murals but does not extend to murals of other styles or damage types. Jaidilert et al.⁸ introduced a computer aided semi-automatic repair framework. The framework involves user-provided seeds, followed by region-growing and morphological operations to identify the location of scratches, and finally applies various conventional image restoration methods such as Total Variation (TV)⁹, Curvature-Driven Diffusion (CDD)¹⁰, Lattice Boltzmann Method (LBM)¹¹, and Group Sparsity Regularization (GSR)¹². Chen et al.¹³ proposed an improved adaptive curvature-driven model for restoring Dunhuang murals. The approach addresses issues such as false edges and staircase effects in the restoration results, and uses adaptive control strategies to reduce restoration time, but is less effective when dealing with extensive irregular damage and specific mural diseases.

Despite many years of development, conventional image restoration methods still struggle to meet the demands of digital restoration for ancient murals, mainly for two aspects: (1) poor performance in restoring murals with large areas of damage, and (2) inability to generate new information based on existing content. Consequently, with the rise of deep learning techniques in the field of image processing, researchers have increasingly shifted their focus to deep learning-based methods. Yu et al.¹⁴ proposed an end-to-end network based on U-Net for restoring Dunhuang murals, which demonstrates excellent performance in repairing highly non-rigid and irregular deteriorated regions. Cao et al.¹⁵ introduced dilated convolutions to improve GANs¹⁶ for the restoration of ancient Chinese murals. However, when dealing with complex structures and large missing areas, issues like blurriness and loss of structural information may arise. Wang et al.¹⁷ proposed a Thanka mural restoration method based on multi-scale adaptive partial convolutions and stroke-like masks, but it also struggles with handling complex structures and large damaged areas. Ciortan et al.¹⁸, inspired by the modus operandi of an artist—edges first, then the color palette, and color tones at last—proposed a GAN-based restoration algorithm with two generators: one for edge and the other for color. The approach achieves satisfactory results in the restoration of Dunhuang murals. Li et al.¹⁹ proposed a line drawing guided progressive mural inpainting method, dividing the inpainting process into two steps: structural reconstruction and color correction. The approach shows superior performance in mural image restoration, but it relies on line drawings manually created by professional artists as guidance. Schmidt et al.²⁰ proposed an image restoration method combining CAR²¹ and HINet²² for the restoration of Dunhuang murals. The method performs well even with large damaged areas but may result in some loss of detail. Ge et al.²³ proposed a virtual restoration network for ancient murals based on global-local feature extraction and structural information guidance. The method achieves excellent results in terms of structural continuity, color harmony, and visual rationality in the restoration of mural images. Ren et al.²⁴ proposed a generative adversarial network model that combines a parallel dual convolutional feature extraction depth generator and a ternary heterogeneous joint discriminator, achieving effective restoration of Dunhuang murals.

Although current deep learning methods can handle some cases of large-scale damage in murals, several challenges remain: (1) Existing methods, mostly based on supervised learning, are highly influenced by the types of masks used during training. This leads to inconsistent performance when dealing with diverse types of damage. (2) Restoration of large-scale damage often results in blurriness or loss of fine details, especially in murals with intricate structures. This is a limitation common to existing methods. (3) The trained models in existing methods typically produce deterministic outputs. If a specific mural cannot be well-restored, repeated attempts will yield the same unsatisfactory results. (4) Existing methods require masks to be obtained through marking and cannot handle subtle, dense degradation patterns that cannot be explicitly marked.

To address these challenges, we propose a mural image restoration method based on conditional diffusion model²⁵. The main contributions of this paper are twofold: (1) We propose a similarity function that adapts the diffusion model—originally designed for image synthesis tasks—to the image restoration task in an indirect manner. The image synthesis capabilities of the diffusion model surpass those of existing deep learning frameworks, such as GANs. This contribution can result in extraordinary performance when applied to restoration tasks, particularly for images with large areas of damage and complex details (solve challenge 2). Moreover, the proposed similarity function is applied to the sampling algorithm rather than the backbone network, enabling unsupervised training and eliminating reliance on masks (solve challenge 1). Additionally, since the similarity function does not impose direct constraints on the unknown (damaged) areas during the sampling process, it allows the inherent randomness of the diffusion model in image synthesis tasks to be leveraged for restoration, leading to more diverse restoration outcomes (solve challenge 3). (2) We also propose an interrupted sampling strategy to address subtle, dense degradation types that cannot be explicitly marked, without requiring any masks throughout the whole process (solve challenge 4).

Methods

Diffusion model

Ancient mural images usually contain intricate structures, rich textures, and diverse color information, which place high demands on the generative capabilities of restoration models, especially when dealing with large areas of damage. Existing research²⁵ has demonstrated that diffusion models have surpassed GANs, becoming the leading models for high-quality image synthesis. Therefore, we introduce diffusion model and propose a lossless image-guided algorithm to better address the challenges of mural restoration tasks.

The diffusion model is a generative model that, like other generative models, learns the distribution of the training set and generates similar samples. However, unlike other generative models, the training of a diffusion model is not an end-to-end process. Instead, it progressively adds noise to the origin images until they become pure noise, while a neural network is trained to learn this noising process. During sampling, this noise process is reversed—i.e., denoising—allowing the model to generate a target sample from random noise. The framework of the model is illustrated in Fig. 1.

**Fig. 1: The framework of diffusion model.**

Let the original image be represented as x₀, the image at step t as x_t, and the image at the final step as x_T. In the training of DDPM²⁶, the diffusion process converts the image x₀ into Gaussian white noise ${{\boldsymbol{x}}}_{T} \sim {\mathcal{N}}(0,1)$ over T steps. Each step is defined as

$$q({{\boldsymbol{x}}}_{t}| {{\boldsymbol{x}}}_{t-1}) \sim {\mathcal{N}}({{\boldsymbol{x}}}_{t};\sqrt{1-{\beta }_{t}}{{\boldsymbol{x}}}_{t-1},{\beta }_{t}{\bf{I}})$$

(1)

That is, x_t is obtained by scaling x_t−1 by a factor of $\sqrt{1-{\beta }_{t}}$ and then adding Gaussian noise with mean 0 and variance β_tI. Here, β_t is a predefined hyperparameter, and $\sqrt{1-{\beta }_{t}}$ decreases as t increases.

By the Markov assumption, the joint probability of all the latent variables can be factorized as a product of conditional probabilities over all the steps. This allows us to rewrite the forward diffusion process as

$$q({{\boldsymbol{x}}}_{t}| {{\boldsymbol{x}}}_{0}) \sim {\mathcal{N}}({{\boldsymbol{x}}}_{t};\sqrt{{\bar{\alpha }}_{t}}{{\boldsymbol{x}}}_{0},(1-{\bar{\alpha }}_{t}){\bf{I}})$$

(2)

where α_t = 1 − β_t and ${\bar{\alpha }}_{t}=\mathop{\prod }\nolimits_{i = 1}^{t}{\alpha }_{i}$. Equation (2) shows that we can sample x_t at any timestep t directly from x₀ without going step by step, effectively scaling and adding noise in one shot.

Using a neural network to obtain the mean μ_θ(x_t, t) and variance Σ_θ(x_t, t), we can define the reverse denoising process, which generates an image by progressively removing noise. The reverse process is modeled as a parameterized Gaussian distribution:

$${p}_{\theta }({{\boldsymbol{x}}}_{t-1}| {{\boldsymbol{x}}}_{t}) \sim {\mathcal{N}}({{\boldsymbol{x}}}_{t-1};{{\boldsymbol{\mu }}}_{\theta }({{\boldsymbol{x}}}_{t},t),{{\mathbf{\Sigma }}}_{\theta }({{\boldsymbol{x}}}_{t},t))$$

(3)

where μ_θ(x_t, t) and Σ_θ(x_t, t) are the mean and variance obtained from the neural network, conditioned on the noisy image x_t and the timestep t.

Combining the forward process defined by Equation (2) and the reverse process defined by Equation (3), the mean μ_θ(x_t, t) can be obtained as

$${{\boldsymbol{\mu }}}_{\theta }({{\boldsymbol{x}}}_{t},t)=\frac{1}{\sqrt{{\alpha }_{t}}}\left({{\boldsymbol{x}}}_{t}-\frac{{\beta }_{t}}{\sqrt{1-{\bar{\alpha }}_{t}}}\right){{\boldsymbol{\epsilon }}}_{\theta }({{\boldsymbol{x}}}_{t},t)$$

(4)

where ϵ_θ(x_t, t) is the noise predicted by the neural network for timestep t.

To train the target network, we need to derive the optimization objective based on the Variational Lower Bound (VLB) of the log-likelihood of the data:

$$\begin{array}{c} {\mathbb{E}}[-\log p_\theta({\boldsymbol{x}}_0)]\leq{\mathbb{E}}_q\left[-\log\frac{p_\theta({\boldsymbol{x}}_{0:T})}{q({\boldsymbol{x}}_{1:T}|{\boldsymbol{x}}_0)}\right] \\ ={\mathbb{E}}_q[\underbrace{D_{KL}(q({\boldsymbol{x}}_T|{\boldsymbol{x}}_0)\|p({\boldsymbol{x}}_T))}_{L_T}+\sum\limits_{t>1}\underbrace{D_{KL}(q({\boldsymbol{x}}_{t-1}|{\boldsymbol{x}}_t,{\boldsymbol{x}}_0)\|p_\theta({\boldsymbol{x}}_{t-1}|{\boldsymbol{x}}_t))}_{L_{t-1}}-\underbrace{\log p_\theta({\boldsymbol{x}}_0|{\boldsymbol{x}}_1)}_{L_0}] \\ =L_{{\mathrm{VLB}}} \end{array}$$

(5)

By simplifying the L_t−1 term, we can derive the final optimization objective for training the diffusion model:

$${L}_{{\rm{simple}}}={{\mathbb{E}}}_{t,{{\boldsymbol{x}}}_{0},{\boldsymbol{\epsilon }}}\left[{\left\Vert ({\boldsymbol{\epsilon }}-{{\boldsymbol{\epsilon }}}_{\theta }({{\boldsymbol{x}}}_{t},t))\right\Vert }^{2}\right]$$

(6)

The resulting simplified objective is the mean squared error (MSE) between the true noise ϵ and the noise predicted by the neural network ϵ_θ(x_t, t).

Proposed lossless image-guided sampling algorithm

The original diffusion model generates samples directly from noise. To apply it to image inpainting or restoration tasks, we need to introduce control conditions. Dhariwal et al.²⁵ modified the model’s sampling process to achieve classifier-guided generation. Liu et al.²⁷ further expanded the classifier to various guiding forms, such as text guidance and image guidance. With the introduction of control conditions, the forward process remains unaffected and is still defined by Eq. (1), while the reverse process is defined as

$${p}_{\theta }({{\boldsymbol{x}}}_{t-1}| {{\boldsymbol{x}}}_{t},{\boldsymbol{y}}) \sim {\mathcal{N}}({{\boldsymbol{x}}}_{t-1};{{\boldsymbol{\mu }}}_{\theta }({{\boldsymbol{x}}}_{t},t)+\gamma {{\mathbf{\Sigma }}}_{\theta }({{\boldsymbol{x}}}_{t},t){\nabla }_{{{\boldsymbol{x}}}_{t}}F({{\boldsymbol{x}}}_{t},{\boldsymbol{y}},t),{{\mathbf{\Sigma }}}_{\theta }({{\boldsymbol{x}}}_{t},t))$$

(7)

where y represents the condition, F(x_t, y, t) is a similarity measure that quantifies the relationship between the sample x_t and the condition y. Equation (7) can be interpreted as using the gradient of F(x_t, y, t) to adjust the mean μ_θ(x_t, t) (including both known and unknown regions), causing it to shift towards areas with higher values of F(x_t, y, t). The coefficient γ controls how much the gradient influences the mean.

The specific similarity measure F(x_t, y, t) depends on the task. For example, in their image content guidance task, Liu et al.²⁷ passed x_t and y through a pre-trained image encoder for noisy images to obtain feature embeddings, then calculated the distance between these feature embeddings. But here is an important issue regarding image inpainting: when using an image to guide the restoration, the lossy compression process during feature embedding can result in a significant difference between the details of the known regions in the original image and the output of the model. To address this problem, we propose a new method for calculating the similarity measure F(x_t, y, t). By examining Equation (2), we find that reversing it yields

$${{\boldsymbol{x}}}_{0}({{\boldsymbol{x}}}_{t},t)=\frac{1}{\sqrt{{\bar{\alpha }}_{t}}}({{\boldsymbol{x}}}_{t}-\sqrt{1-{\bar{\alpha }}_{t}}{{\boldsymbol{\epsilon }}}_{\theta }({{\boldsymbol{x}}}_{t},t))$$

(8)

In other words, at each step of the sampling process, we can obtain a rough estimate of x₀(x_t, t). Therefore, we define F(x_t, y, t) as the distance between x₀(x_t, t) and y. Let m donate the mask, y the conditional image, and the known region as (1 − m) ⊙ y, the function F(x_t, y, t) we define can be expressed as

$$F({{\boldsymbol{x}}}_{t},{\boldsymbol{y}},t)={\left\Vert ((1-{\boldsymbol{m}})\odot {{\boldsymbol{x}}}_{0}({{\boldsymbol{x}}}_{t},t)-(1-{\boldsymbol{m}})\odot {\boldsymbol{y}})\right\Vert }^{2}$$

(9)

Since we directly use y for calculations without any lossy compression, the generated results can achieve a high level of detail consistency with the reference image in the known regions. Our algorithm is outlined in Fig. 2 and Algorithm 1.

Algorithm 1

Proposed Lossless Image-Guided Sampling Algorithm

1: ${{\boldsymbol{x}}}_{T} \sim {\mathcal{N}}(0,1)$

2: for t = T, …, 1 do

3: ${{\boldsymbol{x}}}_{0}({{\boldsymbol{x}}}_{t},t)=\frac{1}{\sqrt{{\bar{\alpha }}_{t}}}({{\boldsymbol{x}}}_{t}-\sqrt{1-{\bar{\alpha }}_{t}}{{\boldsymbol{\epsilon }}}_{\theta }({{\boldsymbol{x}}}_{t},t))$

4: ${\nabla }_{{{\boldsymbol{x}}}_{t}}F({{\boldsymbol{x}}}_{t},{\boldsymbol{y}},t)={\nabla }_{{{\boldsymbol{x}}}_{t}}{\left\Vert (1-{\boldsymbol{m}})\odot {{\boldsymbol{x}}}_{0}({{\boldsymbol{x}}}_{t},t)-(1-{\boldsymbol{m}})\odot {\boldsymbol{y}}\right\Vert }^{2}$

5: ${\boldsymbol{\epsilon }} \sim {\mathcal{N}}(0,1)$

6: ${{\boldsymbol{\mu }}}_{\theta }({{\boldsymbol{x}}}_{t},t)=\frac{1}{\sqrt{{\alpha }_{t}}}({{\boldsymbol{x}}}_{t}-\frac{{\beta }_{t}}{\sqrt{1-{\bar{\alpha }}_{t}}}){{\boldsymbol{\epsilon }}}_{\theta }({{\boldsymbol{x}}}_{t},t)$

7: ${{\boldsymbol{x}}}_{t-1}({{\boldsymbol{x}}}_{t},t)={{\boldsymbol{\mu }}}_{\theta }({{\boldsymbol{x}}}_{t},t)+\gamma {{\mathbf{\Sigma }}}_{\theta }({{\boldsymbol{x}}}_{t},t){\nabla }_{{{\boldsymbol{x}}}_{t}}F({{\boldsymbol{x}}}_{t},{\boldsymbol{y}},t)+\sqrt{{{\mathbf{\Sigma }}}_{\theta }({{\boldsymbol{x}}}_{t},t)}{\boldsymbol{\epsilon }}$

8: end for

9: x = (1 − m) ⊙ y + m ⊙ x₀(x₁, t)

10: return x

Results

Dataset

We collect a number of ancient mural images with minimal or no damage to construct a mural dataset. The sources of these murals are the Dunhuang murals from Gansu and the Baisha murals from Yunnan, which encompass different periods, styles, and subjects. The Dunhuang murals span a long period, beginning in the Northern Wei dynasty (386–534) and lasting until the Qing dynasty (1644–1911), with each period exhibiting its own distinct style. The Baisha murals are from the Ming (1368–1644) and Qing (1644–1911) dynasties, representing a fusion of various religious cultures, including Taoism, Tibetan Buddhism, Han Buddhism, and Dongbaism. The original images are cropped into small sub-images with minimal overlap and augmented to produce 10,314 images of size 256 × 256, with 8251 images used for training and 2063 for simulated damage testing. Additionally, we selected some images from severely damaged murals that cannot be used for training or simulated testing. These images are cropped into 105 sub-images of size 256 × 256, which are used for a real damage restoration experiment.

Simulated data experiment

In this section, we present our experiments on simulated damaged data. We compare our method with three advanced image restoration approaches—AOT²⁸, EdgeConnect²⁹, and LaMa³⁰—using the Structural Similarity Index Measure (SSIM) and Learned Perceptual Image Patch Similarity (LPIPS) as quantitative evaluation metrics. In our method, the backbone network for noise prediction is based on the network proposed by Dhariwal et al.²⁵. Our model is trained with 1000 diffusion steps, and the sampling steps are also set to 1000 during restoration, with a guidance weight γ of 0.001 and a learnable variance. For AOT, EdgeConnect, and LaMa, apart from resizing the images to 256 × 256, we keep the default hyperparameters as defined in their respective open-source implementations.

Our method, being unsupervised, does not rely on masks during training. For LaMa, as its mask generation strategy is integral to its algorithm, we use its proprietary mask generation strategy. For AOT and EdgeConnect, we use irregular masks from Liu et al.³¹. To ensure fairness and evaluate the models’ generalization to different mask types, we test with stroke-like masks from Yu et al.³², which none of the models have encountered during training.

Figure 3 shows a visual comparison of six examples from the four methods, while Table 1 provides the objective evaluation metrics for the images in Fig. 3. In Fig. 3, the first column shows the ground truth, the second column the masked image, the third column AOT’s results, the fourth column EdgeConnect’s results, the fifth column LaMa’s results, and the sixth column our method’s results. In Table 1, the numbers in the first column correspond to the image numbers in Fig. 3. Each row displays the SSIM and LPIPS restoration results of our method and three comparison approaches.

Table 1 SSIM and LPIPS of simulated data experiment

Full size table

Real data experiment

In this subsection, we conduct restoration experiments on real damaged murals. It should be noted that real damaged murals do not have a ground truth for quantitative evaluation, and any feasible evaluation methods inherently involve subjectivity. We evaluate the model’s performance through voting and collect voting results from 100 volunteers on the Internet. The voting process follows an exclusive voting system, where participants are asked to select the one they consider the best in each group of restoration results. Figure 4 shows visual comparisons of restoration results for six real damaged murals, and Fig. 5 presents the corresponding voting results.

Diverse outputs

In this subsection, we present a unique feature that previous methods lack: non-deterministic restoration results. In existing approaches, once a model is trained, the restoration outcome for a particular mural image is fixed; no matter how many times the restoration process is repeated, the result remains the same. This implies that if the restoration fails once, the model inherently struggles with that image, and adjustments to the dataset and training process, followed by retraining, are required—an expensive and time-consuming task. Our method, however, can produce different restoration results by simply setting different random seeds, without the need for dataset adjustments or retraining. If one restoration attempt does not achieve the desired outcome, multiple attempts can be made, and the best result can be selected from these outputs. Additionally, the diverse outputs generated by our method can offer restoration experts valuable references and inspiration, thereby aiding in the manual restoration of murals.

The idea behind this functionality is that, although our method is applied to a restoration task, it essentially performs image synthesis. The generated results remain consistent with the reference image in the known (intact) regions, achieving restoration indirectly. However, in the unknown (damaged) regions, no direct constraints are applied, resulting in varied outcomes.

Figure 6 illustrates restoration results with two different random seeds using our method. The first column shows the ground truth, the second column shows the masked image, and the third and fourth columns display restoration results with different random seeds. Table 2 presents the statistical information of the metrics for 100 restoration attempts corresponding to the four examples shown in Fig. 6. The “Max” represents the maximum value, the “Min” represents the minimum value, the “Avg” denotes the average value, and the “Var” refers to the variance.

Table 2 Statistical information of 100 restoration attempts

Full size table

Unmarkable degradations

Murals exhibit not only continuous damage but also subtle, dense degradation types that cannot be explicitly marked. Existing methods require marking the damaged area to create a mask for restoration, which limits their capability in unmarkable cases. Our method, however, can eliminate this type of degradation without requiring any marking.

As described in Section “Proposed Lossless Image-Guided Sampling Algorithm”, our approach allows obtaining a coarse x₀(x_t, t) at each step of the restoration process. By interrupting the sampling process at an intermediate step—rather than completing the full sampling procedure—we can directly generate x₀(x_t, t) at this stage, achieving degradation removal. The underlying idea is that x₀(x_t, t) is obtained by guiding the known region. This guiding process first restores the overall outline and then recovers the local details. By interrupting the process at an appropriate step, we obtain a result with only partial guidance, discarding certain details. For these degraded mural images, the details discarded by this interruption are exactly those representing the degradation.

Figure 7 shows the effect of interrupting the sampling process at t = 249. In the figure, the first row displays the original images, while the second row shows the modified outputs. The modified results appear smoother and clearer than the original, with fewer irrelevant details, highlighting our method’s capability to eliminate such degradation without a mask.

Time consumption

Our method offers superior restoration quality as compared to existing approaches, at the cost of higher time consumption. Unlike existing deep learning approaches, our method is not an end-to-end restoration process; it requires multiple iterations to produce the final result, leading to higher computational costs. Moreover, for images with unsatisfactory restoration results, our method can use different random seeds to generate multiple restorations. This will increase time consumption. We attempt to reduce time consumption by decreasing the number of sampling steps. However, this will lead to insufficient guidance and cause the output inconsistent with the reference image in the known (intact) regions. This is not an ideal solution. Table 3 presents the average time cost per image for restoring 100 images of size 256 × 256 on an NVIDIA RTX 3090.

Table 3 Average time cost per image

Full size table

Discussion

In Fig. 3, rows 1 and 2 feature simple structures and smaller damaged areas. In the first row, all four methods restore the left part of the rectangular region, where color transitions are simple, effectively. However, in the middle part of the two black curves on the right, AOT and EdgeConnect encounter color and edge blurring issues, with AOT being particularly affected. LaMa and our method restore this area well, reflected in superior evaluation metrics. In the second row, all four models perform well in the lower rectangular region, but AOT and EdgeConnect fail to recover the red curve in the top rectangle, while LaMa and our method do so effectively. Metrics in Table 1 also show LaMa and our method outperforming the others. Rows 3 and 4 represent architectural images with complex lines and extensive damage, the third row shows that AOT and EdgeConnect approximate the outline of the damaged region but suffer from significant blurring and detail loss. LaMa does not experience this level of blurring, but its ability to recover detail is inferior to ours. Our method outperforms in terms of the evaluation metrics as well. In the fourth row, AOT and EdgeConnect fail entirely to restore a basic outline. LaMa restores a rough outline but fails in reconstructing finer details, whereas our method successfully restores fine details, again achieving superior performance on the evaluation metrics. Rows 5 and 6 representing intricate structures with complex curves and extensive damage; both AOT and EdgeConnect fail to restore these examples. LaMa’s results display blurred edges, causing the restored patterns to blend and lose definition, which compromises the finer details. In contrast, our method achieves superior detail restoration, preserving the complex structures without blurring or blending issues. Corresponding evaluation metrics also show that our method performs best.

In Fig. 4, rows 1 to 3 representing cases with smaller damaged areas. In the first row, the damage consists of thin, elongated scratches. Visually, all four methods perform well in restoration, but in the voting results shown in Fig. 5, our method wins with 43 votes. The second and third rows represent typical small-area damage. Both AOT and EdgeConnect exhibit color and edge blurring, with EdgeConnect performing better. LaMa and our method produce more satisfactory restoration results. In the second row, our method wins with 49 votes, while in the third row, LaMa slightly leads with 43 votes, compared to our method’s 42 votes. Rows 4 to 6 representing cases with larger damaged areas. In the fourth row, the damage consists of a large crack. AOT and EdgeConnect do not exhibit blurring in this case, but noticeable artifacts remain in the restored regions. LaMa and our method produce better restoration results, with our method leading the vote with 47 votes, while LaMa follows with 32 votes. The fifth and sixth rows show common large-scale damage, where AOT and EdgeConnect fail to restore the image, resulting in severe blurring of edges and colors. LaMa’s restoration is also unsatisfactory, with noticeable artifacts. In the voting, our method takes a clear lead, with 70 votes in the fifth row and 66 votes in the sixth.

In Table 2, since the variance of SSIM is on the order of 10⁻⁵ to 10⁻⁶, and the variance of LPIPS is on the order of 10⁻⁴ to 10⁻⁵, the stability of restoration outcomes using different random seeds is acceptable. Note, the metrics used to evaluate the outcomes indicate the restoration quality, but they are not directly related to the diversity. Figure 6 also demonstrates that, even with a low variance, the restoration results still exhibit diversity.

In Table 3, it can be seen that our method requires much more time consumption compared to the other three approaches, with our method requiring 162 s to restore an image, while the other three methods take less than one second. Nonetheless, in a task of cultural heritage preservation, restoration quality typically outweighs time constraints.

Overall, for simpler structures and smaller damaged areas, all models achieve satisfactory restoration, with LaMa and our method performing better than AOT and EdgeConnect. For more extensive and structurally complex damage, AOT and EdgeConnect struggle to restore the image, often failing entirely in highly detailed areas. LaMa performs better by reconstructing basic shapes but falls short in fine details. Our method, however, stands out by effectively recovering intricate details and structures, leading to the best performance across these challenging cases. The experimental results also show that the stability of the outputs is acceptable. Although our method takes more time, quality outweighs time constraints in a task of cultural heritage preservation, and the longer time spent is acceptable.

Data availability

The datasets used and/or analyzed in the current study are available from the corresponding author by reasonable request.

Code availability

The code used and/or analyzed in the current study is available from the corresponding author by reasonable request.

References

Yang, X. & Wang, S. Dunhuang mural inpainting in intricate disrepaired region based on improvement of priority algorithm. J. Comput. Aided Des. Comput. Graph. 23, 284–289 (2011).
Google Scholar
Criminisi, A., Pérez, P. & Toyama, K. Region filling and object removal by exemplar-based image inpainting. IEEE Trans. Image Process. 13, 1200–1212 (2004).
Article PubMed Google Scholar
Li, C., Wang, H., Wu, M. & Pan, S. Automatic recognition and virtual restoration of mud spot disease of tang dynasty tomb murals image. Comput Eng. Appl. 52, 233–6 (2016).
Google Scholar
Jiao, L., Wang, W., Li, B. & Zhao, Q. Wutai mountain mural inpainting based on improved block matching algorithm. J. Comput. Aided Des. Comput. Graph 31, 119–25 (2019).
Google Scholar
Wang, H., Li, Q. & Jia, S. A global and local feature weighted method for ancient murals inpainting. Int. J. Mach. Learn. Cybern. 11, 1197–1216 (2020).
Article CAS Google Scholar
Cao, J., Li, Y., Zhang, Q. & Cui, H. Restoration of an ancient temple mural by a local search algorithm of an adaptive sample block. Herit. Sci. 7, 39 (2019).
Article Google Scholar
Jingni, S., Huiqin, W., Meng, W. & Wenzong, Y. Tang dynasty tomb murals inpainting algorithm of mca decomposition. J. Front. Comput. Sci. Technol. 11, 1826 (2017).
Google Scholar
Jaidilert, S. & Farooque, G. Crack detection and images inpainting method for Thai mural painting images. In: 2018 IEEE 3rd International Conference on Image, Vision and Computing (ICIVC), pp. 143–148 (IEEE, 2018).
Shen, J. & Chan, T. F. Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math. 62, 1019–1043 (2002).
Article Google Scholar
Chan, T. F. & Shen, J. Nontexture inpainting by curvature-driven diffusions. J. Vis. Commun. Image Represent. 12, 436–449 (2001).
Article Google Scholar
Chen, Y. A lattice-boltzmann method for image inpainting. In: 2010 3rd International Congress on Image and Signal Processing, vol. 3, pp. 1222–1225 (IEEE, 2010).
Zhang, J., Zhao, D. & Gao, W. Group-based sparse representation for image restoration. IEEE Trans. Image Process. 23, 3336–3351 (2014).
Article PubMed Google Scholar
Chen, Y., Ai, Y. & Guo, H. Inpainting algorithm for Dunhuang mural based on improved curvature-driven diffusion model. J. Comput. Aided Des. Comput. Graph. 32, 787–796 (2020).
Google Scholar
Yu, T. et al. End-to-end partial convolutions neural networks for Dunhuang grottoes wall-painting restoration. In: Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, pp. 0–0 (2019).
Cao, J., Zhang, Z., Zhao, A., Cui, H. & Zhang, Q. Ancient mural restoration based on a modified generative adversarial network. Herit. Sci. 8, 1–14 (2020).
Article Google Scholar
Goodfellow, I. et al. Generative adversarial nets. Advances in neural information processing systems 27 (2014).
Wang, N., Wang, W., Hu, W., Fenster, A. & Li, S. Thanka mural inpainting based on multi-scale adaptive partial convolution and stroke-like mask. IEEE Trans. Image Process. 30, 3720–3733 (2021).
Article PubMed Google Scholar
Ciortan, I.-M., George, S. & Hardeberg, J. Y. Colour-balanced edge-guided digital inpainting: applications on artworks. Sensors 21, 2091 (2021).
Article PubMed PubMed Central Google Scholar
Li, L. et al. Line drawing guided progressive inpainting of mural damages. arXiv preprint arXiv:2211.06649 (2022).
Schmidt, A., Madhu, P., Maier, A., Christlein, V., Kosti, R. Arin: adaptive resampling and instance normalization for robust blind inpainting of Dunhuang cave paintings. In: 2022 Eleventh International Conference on Image Processing Theory, Tools and Applications (IPTA), pp. 1–6 (IEEE, 2022).
Sun, W. & Chen, Z. Learned image downscaling for upscaling using content adaptive resampler. IEEE Trans. Image Process. 29, 4027–4040 (2020).
Article Google Scholar
Chen, L., Lu, X., Zhang, J., Chu, X., Chen, C. Hinet: Half instance normalization network for image restoration. In: Proc. of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 182–192 (2021).
Ge, H., Yu, Y. & Zhang, L. A virtual restoration network of ancient murals via global–local feature extraction and structural information guidance. Herit. Sci. 11, 264 (2023).
Article Google Scholar
Ren, H., Sun, K., Zhao, F. & Zhu, X. Dunhuang murals image restoration method based on generative adversarial network. Herit. Sci. 12, 39 (2024).
Article Google Scholar
Dhariwal, P. & Nichol, A. Diffusion models beat GANs on image synthesis. Adv. Neural Inf. Process. Syst. 34, 8780–8794 (2021).
Google Scholar
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Adv. neural Inf. Process. Syst. 33, 6840–6851 (2020).
Google Scholar
Liu, X. et al. More control for free! image synthesis with semantic diffusion guidance. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 289–299 (2023).
Zeng, Y., Fu, J., Chao, H. & Guo, B. Aggregated contextual transformations for high-resolution image inpainting. IEEE Trans. Vis. Comput. Graph. 29, 3266–3280 (2022).
Article Google Scholar
Nazeri, K., Ng, E., Joseph, T., Qureshi, F., Ebrahimi, M. Edgeconnect: Structure guided image inpainting using edge prediction. In: Proc. of the IEEE/CVF International Conference on Computer Vision Workshops pp. 0–0 (2019).
Suvorov, R. et al. Resolution-robust large mask inpainting with Fourier convolutions. In: Proc. of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 2149–2159 (2022).
Liu, G. et al. Image inpainting for irregular holes using partial convolutions. In: Proc. of the European Conference on Computer Vision (ECCV). pp. 85–100 (2018).
Yu, J. et al. Free-form image inpainting with gated convolution. In: Proc. of the IEEE/CVF International Conference on Computer Vision. pp. 4471–4480 (2019).

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (Grant Nos. 62166048, 61263048), by the Applied Basic Research Project of Yunnan Province (Grant No. 2018FB102).

Author information

Authors and Affiliations

School of Information Science and Engineering, Yunnan University, Kunming, Yunnan, China
Jialv Hu, Ying Yu & Qixue Zhou

Authors

Jialv Hu
View author publications
Search author on:PubMed Google Scholar
Ying Yu
View author publications
Search author on:PubMed Google Scholar
Qixue Zhou
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors listed have made a substantial, direct, and intellectual contribution to the work and approved it for publication.

Corresponding author

Correspondence to Ying Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hu, J., Yu, Y. & Zhou, Q. GuidePaint: lossless image-guided diffusion model for ancient mural image restoration. npj Herit. Sci. 13, 118 (2025). https://doi.org/10.1038/s40494-025-01693-z

Download citation

Received: 11 December 2024
Accepted: 07 April 2025
Published: 22 April 2025
Version of record: 22 April 2025
DOI: https://doi.org/10.1038/s40494-025-01693-z

This article is cited by

Automated image inpainting for historical artifact restoration using hybridisation of transfer learning with deep generative models
- Baggam Swathi
- D. B. Jagannadha Rao
Scientific Reports (2026)
Generating oracle bone inscriptions based on the structure aware diffusion model
- Yue Zhang
- Aoli Han
- Yigang Cen
npj Heritage Science (2025)
Intelligent restoration expert system design for Jingdezhen export porcelain via improved denoising diffusion probabilistic model
- Xinhui Kang
- Guiyong Yang
npj Heritage Science (2025)
All-in-one mural restoration with prompt-guided residual diffusion
- Chao Jiang
- Tiantian Ren
- Zhengyun Cheng
npj Heritage Science (2025)
Oracle bone inscription image restoration via glyph extraction
- Xiaolei Diao
- Daqian Shi
- Hao Xu
npj Heritage Science (2025)

Abstract

Similar content being viewed by others

Introduction

Methods

Diffusion model

Proposed lossless image-guided sampling algorithm

Algorithm 1

Results

Dataset

Simulated data experiment

Real data experiment

Diverse outputs

Unmarkable degradations

Time consumption

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links