Low-rank structure guided diffusion for Shaanxi temple mural restoration

Lei, Tong; Li, Yixuan; Cheng, Zhengyun; Zhou, Wen

doi:10.1038/s40494-025-01827-3

Download PDF

Article
Open access
Published: 25 June 2025

Low-rank structure guided diffusion for Shaanxi temple mural restoration

Tong Lei^1,2,
Yixuan Li¹,
Zhengyun Cheng³ &
…
Wen Zhou^2,4

npj Heritage Science volume 13, Article number: 295 (2025) Cite this article

1591 Accesses
1 Citations
Metrics details

Abstract

The murals in Shaanxi temples and monasteries, with their long history and diverse styles, are invaluable yet non-renewable cultural heritage. However, prolonged environmental exposure has led to severe damage, including cracking, mold growth, and large-scale detachment, creating an urgent need for restoration. Traditional restoration methods struggle with reconstructing complex structures and patterns due to their neglect of the murals’ global structure. To address this, we propose a novel diffusion model guided by global low-rank structure for mural restoration. By leveraging the inherent low-rank prior of mural images, our model explicitly captures non-local similarities within murals. To enhance computational efficiency, we incorporate orthogonal Tucker decomposition, reducing the complexity of low-rank solutions. Comprehensive experiments and ablation studies validate the effectiveness of the low-rank prior, demonstrating that our model achieves state-of-the-art performance and provides significant advancements in digital restoration of ancient murals.

Mural inpainting via two-stage generative adversarial network

Article Open access 19 May 2025

All-in-one mural restoration with prompt-guided residual diffusion

Article Open access 18 December 2025

Diffusion-based heterogeneous network for ancient mural restoration

Article Open access 22 May 2025

Introduction

The murals of temples and monasteries in Shaanxi have a long history and are diverse in type. As non-renewable cultural heritage, they hold significant historical and artistic value. These murals not only incorporate elements of Buddhism, Taoism, and folk beliefs, reflecting the distinctive features of social life, religious beliefs, and aesthetic values of their time, but also represent the development of ancient Chinese painting techniques. Particularly during the Yuan, Ming, and Qing dynasties, they preserved many original works by renowned artists, establishing their irreplaceable importance¹.

Due to the long term impact of natural weathering and human factors, these murals have undergone different kinds of disease, such as cracking, falling off, hollowing, pulverization, fading, color changing, getting mildewed, smudging and scratches, and so on. Therefore, there is an urgent need to restore the murals combined with the environment and painting materials. Meanwhile, the manual restoration work of Shaanxi murals is arduous and complex. The development of image processing and deep learning technology have enabled digital restoration of mural images to become a research hot topic².

With the development of artificial intelligence, neural networks are widely used in computer vision tasks^3,4. As a widely used paradigm in neural networks, the Generative Adversarial Networks (GANs) have been at the forefront of mural restoration, with numerous studies proposing GAN-based improvements for image restoration tasks^5,6. These GAN architectures directly learn the data distribution, enabling the generation of highly realistic restorations^7,8. However, the recently popularized Diffusion models⁹ have emerged as a powerful alternative, which is often outperforming traditional GAN approaches in various image generation and restoration tasks^10,11,12. Unlike GANs, Diffusion models focus on modeling the gradual change of data distribution over time, a process analogous to the physical diffusion phenomenon.

In the context of mural restoration, Diffusion models offer several advantages. They provide more stable training dynamics compared to GANs, which are well-known that the resulting distributions can suffer from mode collapse and catastrophic forgetting^13,14. Furthermore, Diffusion models have demonstrated superior performance in capturing fine details and textures¹⁵, which is crucial for preserving the aesthetic integrity of historical murals. The ability of Diffusion models to generate diverse, high-quality samples also addresses the common criticism of mode collapse often associated with GANs. While diffusion models have demonstrated remarkable performance in general image restoration tasks by solely leveraging local information, their application to the specific domain of mural restoration presents unique challenges. In the context of mural restoration, the exclusive focus on local texture may prove insufficient. To achieve optimal results, it is imperative to incorporate prior knowledge about the global low-rank structure inherent in mural paintings¹⁶.

The crucial prior is called low-rankness, which refers to the fact that a signal often contains many repetitive local patterns, and thus a local pattern always has many similar patterns across the whole signal¹⁷. The low-rank principle forms the cornerstone of many techniques, e.g., image restoration/inpainting, video saliency detection, etc. In mural painting, low-rankness is generally expressed in the following three aspects: Firstly, mural paintings typically exhibit large-scale structural coherence that extends beyond local neighborhoods¹⁶; global low-rankness effectively captures these long-range dependencies, thereby ensuring that the restored image maintains overall compositional integrity. Secondly, despite their complexity, mural paintings can often be represented by a relatively small number of basic elements, analogous to the concept of sparsity in signal processing¹⁸; the low-rank assumption aligns with this sparse representation, facilitating more computationally efficient and accurate restoration processes¹⁹. Thirdly, while local patterns play a crucial role, global low-rankness ensures consistency in texture across the entire mural²⁰, mitigating the risk of localized over-fitting and preserving the overall artistic style. In summary, it is necessary to embed global low-rank in mural restoration.

It is worth noting that the efficacy of low-rank minimization in tensor completion has been corroborated by numerous empirical studies^21,22. However, the size of murals is usually arbitrary, and when faced with the restoration of large-scale murals, the time consumption is a major concern. To address the aforementioned challenges, this paper makes the following contributions:

1.
We propose a novel framework called Low-rank Structure Guided Diffusion (LRDiff) that seamlessly integrates global low-rank priors into diffusion models for mural restoration, thereby addressing the performance limitations of existing approaches.
2.
We formulate a rigorous low-rank constraint using tucker decomposition, embedding low-rank properties on the smaller Tucker kernel tensor, effectively reduces the time complexity of mural restoration with arbitrary scale.
3.
We introduce an adaptive mechanism that balances the influence of local and global information during the restoration process, ensuring optimal preservation of both fine details and overall compositional structure.
4.
To demonstrate the effectiveness of our method, we conduct extensive experiments on real-world mural dataset, demonstrating the superior performance of our proposed method compared to state-of-the-art techniques.

Methods

Figure 1 presents an overview of our proposed LRDiff method, which addresses three key aspects: 1) Leveraging the low-rank structure of data to mitigate the semantic discrepancy between low-quality and high-quality murals; 2) Adaptively balancing global low-rank structure and pixel-wise similarity during the reverse denoising process; 3) Obtaining low-rank approximate solutions within a reduced subspace of smaller core tensors. This approach effectively integrates low-rank priors into the diffusion model framework, enabling more efficient and accurate mural restoration while maintaining structural coherence across the image. All the mathematical symbols used are summarized in Table 1. Before shedding light on our technique, we elaborate the DDPMs²³ for image inpainting.

Table 1 Basic matrix/tensor notation and symbols

Full size table

Denoising diffusion probabilistic models with increasing rank

Given a high-quality image ${{\mathcal{X}}}_{0}$ and its degraded low-quality image μ, where the paired images have the same size, i.e., ${{\mathcal{X}}}_{0},\mu \in {{\mathbb{R}}}^{3\times H\times W}$. The typical DDPMs for inpainting²⁴ basically involve the forward diffusion process and reverse denoising process.

Forward diffusion process

In contrast to Generative Adversarial Networks²⁵, which directly learn data distributions, diffusion models focus on learning the evolution of data distributions over time. Let p₀ denote the initial distribution representing the data, and t ∈ [0, T] represent a continuous time variable. We consider a diffusion process ${\{{{\mathcal{X}}}_{t}\}}_{t = 0}^{T}$ governed by a mean-reverting stochastic differential equation²⁶, defined as:

$${\rm{d}}{\mathcal{X}}={\theta }_{t}(\mu -{\mathcal{X}}){\rm{d}}t+{\sigma }_{t}{\rm{d}}{\mathcal{W}},\quad {{\mathcal{X}}}_{0} \sim {p}_{0}({\mathcal{X}}),$$

(1)

where the ${\mathcal{W}}$ denotes a standard Wiener process, introducing stochasticity to the differential equation. The time-dependent positive parameters θ_t and σ_t characterize the mean-reversion speed and stochastic volatility of the diffusion process, respectively.

These parameters are constrained by the relation σ_t/θ_t = 2λ², ensuring stationary variance, where λ represents the fixed noise level applied to ${{\mathcal{X}}}_{T}$. This constraint maintains the stability of the diffusion process over time, allowing for consistent stochastic behavior throughout the evolution. Take the ${{\mathcal{X}}}_{0}$ as the initial condition, the SDE Eq. (1) have a closed-form solution:

$${{\mathcal{X}}}_{t}=\mu +({{\mathcal{X}}}_{0}-\mu ){{\rm{e}}}^{-{\bar{\theta }}_{t}}+\mathop{\int}\nolimits_{0}^{t}{\sigma }_{z}{{\rm{e}}}^{-{\bar{\theta }}_{z:t}}{\rm{d}}{\mathcal{W}}(z),$$

(2)

where the ${\bar{\theta }}_{t}=\mathop{\int}\nolimits_{0}^{t}{\theta }_{z}{\rm{d}}z$ is known and the transition kernel $p(x)={\mathcal{N}}({{\mathcal{X}}}_{t}| {m}_{t}({\mathcal{X}}),{v}_{t})$ is a Gaussian distribution with mean ${m}_{t}({\mathcal{X}})$ and variance v_t given by:

$$\begin{array}{rcl}{m}_{t}({\mathcal{X}})&:= &\mu +({{\mathcal{X}}}_{0}-\mu ){{\rm{e}}}^{-{\bar{\theta }}_{t}},\\ {v}_{t}&:= &{\lambda }^{2}(1-{{\rm{e}}}^{-2{\bar{\theta }}_{t}}).\end{array}$$

(3)

As t → ∞, the mean m_t converges to the low-quality image μ, while the variance v_t approaches the stationary variance λ². This convergence behavior indicates that the forward SDE, described in Eq. (1), progressively diffuses a high-quality image into a low-quality image with additive Gaussian noise of fixed variance. Consequently, the terminal state ${{\mathcal{X}}}_{T}$ is characterized by a Gaussian distribution with predetermined mean and variance, representing the fully degraded image.

During the forward diffusion process, the stochastic nature of the added noise serves to decorrelate pixels that were initially related, potentially transforming linearly dependent rows or columns into linearly independent ones. Theoretically, the introduction of random values to each element of a matrix significantly increases the probability of the resulting data tensor achieving full rank. This phenomenon can be expressed as:

$$\begin{array}{lll}rank({{\mathcal{X}}}_{0})\;\le\; rank(\mu +{\mathcal{N}}(0,{\lambda }^{2}{\mathcal{I}}))\\\qquad\qquad\;\; \le\;rank(\mu )+rank({\mathcal{N}}(0,{\lambda }^{2}{\mathcal{I}})).\end{array}$$

(4)

Thus, the diffusion process causes the rank of the mural image to increase.

Reverse low-rank denoising process

Following the²⁷, we recover the high-quality image from the terminal state ${{\mathcal{X}}}_{T}$ according to:

$${\rm{d}}{\mathcal{X}}=[f({\mathcal{X}},t)-g{(t)}^{2}{\nabla }_{{\mathcal{X}}}\log {p}_{t}({\mathcal{X}})]{\rm{d}}t+g(t){\rm{d}}\hat{{\mathcal{W}}},$$

(5)

where ${\rm{d}}\hat{{\mathcal{W}}}$ denotes a reverse-time Wiener process and let ${{\mathcal{X}}}_{T} \sim {p}_{T}({\mathcal{X}})$. The ${p}_{t}({\mathcal{X}})$ represents the marginal probability density function of ${{\mathcal{X}}}_{t}$ at time t. The score function ${\nabla }_{{\mathcal{X}}}\log {p}_{t}({\mathcal{X}})$ is the sole unknown component, which is generally intractable. Consequently, SDE-based diffusion models approximate this function by training a time-dependent neural network under a score matching objective.

During the training phase, the high-quality image ${{\mathcal{X}}}_{0}$ is available, enabling the neural network to estimate the conditional score ${\nabla }_{{\mathcal{X}}}\log {p}_{t}({\mathcal{X}}| {{\mathcal{X}}}_{0})$. Specifically, we reparameterize ${{\mathcal{X}}}_{t}$ as ${{\mathcal{X}}}_{t}={m}_{t}({\mathcal{X}})+\sqrt{{v}_{t}}{\epsilon }_{t}$. Subsequently, utilizing Eq. (3), we compute the ground truth score as:

$${\nabla }_{{\mathcal{X}}}\log {p}_{t}({\mathcal{X}}| {{\mathcal{X}}}_{0})=-\frac{{\epsilon }_{t}}{\sqrt{{v}_{t}}}.$$

(6)

In above formulation, ϵ_t represents standard Gaussian noise, where ${\epsilon }_{t} \sim {\mathcal{N}}(0,I)$. Following common practice⁹, we approximate the noise using a conditional time-dependent neural network ${\epsilon }_{\phi }({{\mathcal{X}}}_{t},\mu ,t)$. This noise network takes as input the current state ${{\mathcal{X}}}_{t}$, the condition μ, and the time t, subsequently outputting pure noise.

An alternative approach involves determining the optimal reverse state ${\hat{{\mathcal{X}}}}_{t-1}$ from ${{\mathcal{X}}}_{t}$ at the (t − 1)-th timestep via maximum likelihood learning. This optimization is achieved by minimizing the negative log-likelihood, expressed as:

$${\hat{{\mathcal{X}}}}_{t-1}=\mathop{{\rm{argmin}}}\limits_{{{\mathcal{X}}}_{t-1}}[-\log p({{\mathcal{X}}}_{t-1}| {{\mathcal{X}}}_{t},{{\mathcal{X}}}_{0})].$$

(7)

The closed-form solution of above objective is formulated as:

$$\begin{array}{lll}{\hat{{\mathcal{X}}}}_{t-1}\;=\;\displaystyle\frac{1-{{\rm{e}}}^{-2{\bar{\theta }}_{t-1}}}{1-{{\rm{e}}}^{-2{\bar{\theta }}_{t}}}{{\rm{e}}}^{-{\theta }_{t}^{{\prime} }}({{\mathcal{X}}}_{t}-\mu )\\\qquad\quad +\;\displaystyle\frac{1-{{\rm{e}}}^{-2{\theta }_{t}^{{\prime} }}}{1-{{\rm{e}}}^{-2{\bar{\theta }}_{t}}}{{\rm{e}}}^{-{\bar{\theta }}_{t-1}}({{\mathcal{X}}}_{0}-\mu )+\mu ,\end{array}$$

(8)

where the ${\theta }_{t}^{{\prime} }=\mathop{\int}\nolimits_{t-1}^{t}{\theta }_{t}{\rm{d}}t$ and ${\bar{\theta }}_{t}=\mathop{\int}\nolimits_{0}^{t}{\theta }_{z}{\rm{d}}z$.

Time-dependent low-rank structure guidance

As show in Eq. (8), to effectively complete a mural using diffusion techniques, which inherently operate on a pixel-wise basis with local pixel information. But the image satisfies the global structural priors of coherent composition, symmetry and patterns, semantic context, perspective and depth, color harmony, etc. As show in Eq. (4), the diffusion process increases the rank of the image data, our motivation is to recovery the global structure low rankness of images. To incorporate global structural guidance ensures that the diffusion process respects the mural’s overall artistic integrity and meaning, we propose the time-dependent reduced-rank function at timestep t as:

$${{\mathcal{X}}}_{t-1}^{* }=\mathop{{\rm{argmin}}}\limits_{{\mathcal{X}}}{\gamma }_{t}\cdot rank({\mathcal{X}})+\parallel {\mathcal{X}}-{\hat{{\mathcal{X}}}}_{t-1}{\parallel }_{F},$$

(9)

where the ${\hat{{\mathcal{X}}}}_{t-1}$ is calculated by Eq. (8). The first term in Eq. (9) can be regarded as global structural low-rank solution, and the second term as pixel-wise similar solution. The γ_t is used to adaptively balance two optimization directions, which is set as:

$${\gamma }_{t}=\frac{1-{{\rm{e}}}^{-2{\bar{\theta }}_{t-1}}}{1-{{\rm{e}}}^{-2{\bar{\theta }}_{t}}}{{\rm{e}}}^{-{\theta }_{t}^{{\prime} }}.$$

(10)

We take the constant term of the first term in Eq. (8) as time-dependent weight, because it represents exactly the proportion of high-rank noise contained in the pixel-wise optimal solution at the damaged region. The higher the proportion of high-rank noise in the pixel-wise optimal solution, the stronger the low-rank constraint in optimization Eq. (9).

We adaptively adjust the target solution of the reverse SDE to the Pareto front of the low-rankness and similarity by Eq. (9). As the rank minimization problem is NP-hard, it is usually relaxed into the sum of the nuclear norm or L1 norm minimization problem^28,29. However, refer to Sec.??, the mean-reverting SDE method is independent of the size of the image, but the size of the image data has a great influence on the rank minimization. The larger image size, the higher time complexity required to minimize the rank function, so we introduce tucker decomposition³⁰ to reduce the complexity.

Tucker decomposition provides a general factorization of an N-th order tensor into a relatively small size core tensor and factor matrices, a 3-th order tensor ${\mathcal{X}}\in {{\mathbb{R}}}^{{I}_{1}\times {I}_{2}\times {I}_{3}}$ can be expressed as:

$$\begin{array}{lll}{\mathcal{X}}\;\cong \;\mathop{\sum }\limits_{{r}_{1}=1}^{{R}_{1}}\cdots \mathop{\sum }\limits_{{r}_{3}=1}^{{R}_{3}}{g}_{{r}_{1},{r}_{2},{r}_{3}}({{\bf{b}}}_{{r}_{1}}^{(1)}\circ {{\bf{b}}}_{{r}_{2}}^{(2)}\circ {{\bf{b}}}_{{r}_{3}}^{(3)})\\\quad \;=\;{\mathcal{G}}{\times }_{1}{{\bf{B}}}^{(1)}{\times }_{2}{{\bf{B}}}^{(2)}{\times }_{3}{{\bf{B}}}^{(3)}\\\quad\; =\;[\![{\mathcal{G}};{{\bf{B}}}^{(1)},{{\bf{B}}}^{(2)},{{\bf{B}}}^{(3)}]\!],\end{array}$$

(11)

where the ${\mathcal{G}}\in {{\mathbb{R}}}^{{R}_{1}\times {R}_{2}\times {R}_{3}}$ is the core tensor, and ${{\bf{B}}}^{(n)}=[{{\bf{b}}}_{1}^{(n)},{{\bf{b}}}_{2}^{(n)},\cdots \,,{{\bf{b}}}_{{R}_{n}}^{(n)}]\in {{\mathbb{R}}}^{{I}_{n}\times {R}_{n}}$ is the mode-n factor matrices, n = 1, 2, 3, typically, R_n ≪ I_n. The core tensor models a potentially complex pattern of mutual interaction between the vectors in different modes.

The Multilinear Singular Value Decomposition, also called the higher-order SVD, can be considered as a special form of the constrained Tucker decomposition^31,32, in which all factor matrices, ${{\bf{B}}}^{(n)}={{\bf{U}}}^{(n)}\in {{\mathbb{R}}}^{{I}^{n}\times {I}^{n}}$, are orthogonal matrice, i.e. U^(n)T ⋅ U⁽ⁿ⁾ = I, where the I is identity matrix. After obtaining the orthogonal matrices U⁽ⁿ⁾ of left singular vectors of X_(n), the core tensor ${\mathcal{G}}$ can be computed as:

$${\hat{{\mathcal{G}}}}_{t-1}={\hat{{\mathcal{X}}}}_{t-1}{\times }_{1}{{\bf{U}}}^{(1){\rm{T}}}{\times }_{2}{{\bf{U}}}^{(2){\rm{T}}}{\times }_{3}{{\bf{U}}}^{(3){\rm{T}}}.$$

(12)

The two states $\{{{\mathcal{X}}}_{t-1}^{* },{\hat{{\mathcal{X}}}}_{t-1}\}$ of the image at the same timestep during the diffusion process can be represented using different linear combinations of the same set of factor matrices.

Based on Tucker decomposition mentioned above, we propose Tucker rank minimization to adjust the optimal solution so that the supervised generation process is optimized towards both pixel-wise similarity solution and rank reduction of the global structure.

Tucker rank minimization for mural

The Tucker rank of the 3-th order image ${\mathcal{X}}\in {{\mathbb{R}}}^{3\times H\times W}$ corresponds to the 3-tuple (R₁, R₂, R₃) consisting of the dimensions of the different subspaces. If the Tucker decomposition Eq. (11) holds exactly it is mathematically defined as:

$$ran{k}_{t}({\mathcal{X}})=\{rank({{\bf{X}}}_{(1)}),rank({{\bf{X}}}_{(2)}),rank({{\bf{X}}}_{(3)})\},$$

(13)

where the X_(n) is the mode-n unfolding of tensor ${\mathcal{X}}$. For the Tucker format, the sum of nuclear norms for all mode-n unfolding matrices has been developed as a convex surrogate of the Tucker rank²⁸, we can rewrite the Eq. (9) as:

$$\mathop{\min }\limits_{{\mathcal{X}}}\frac{{\gamma }_{t}}{3}\mathop{\sum }\limits_{n=1}^{3}\parallel {{\bf{X}}}_{(n)}{\parallel }_{* }+\parallel {\mathcal{X}}-{\hat{{\mathcal{X}}}}_{t-1}{\parallel }_{F}^{2}.$$

(14)

For the orthonormal Tucker format of given image, that is, ${\mathcal{X}}=[[{\mathcal{G}};{{\bf{U}}}^{(1)},{{\bf{U}}}^{(2)},{{\bf{U}}}^{(3)}]]$, the Frobenius norms and the Schatten p-norms of ${\mathcal{X}}$ and ${\mathcal{G}}$ are equal:

$$\begin{array}{lll}\parallel {\mathcal{X}}-{\hat{{\mathcal{X}}}}_{t-1}{\parallel }_{F}\;=\;\parallel [\![{\mathcal{G}}-{\hat{{\mathcal{G}}}}_{t-1};{{\bf{U}}}^{(1)},{{\bf{U}}}^{(2)},{{\bf{U}}}^{(3)}]\!]{\parallel }_{F}\\\qquad\qquad\qquad\; =\;\parallel {\mathcal{G}}-{\hat{{\mathcal{G}}}}_{t-1}{\parallel }_{F},\end{array}$$

(15)

$$\parallel {\mathcal{X}}{\parallel }_{Sp}=\parallel {\mathcal{G}}{\parallel }_{Sp},1\,\leqslant\, p < \infty .$$

(16)

Thus, the computation of the Frobenius norms can be performed with an ${\mathcal{O}}({R}^{3})$ complexity, where the R = max{R₁, R₂, R₃}, instead of the usual order ${\mathcal{O}}({I}^{3})$ complexity, typically R ≪ I. The Schatten p-norm of an N-th order tensor ${\mathcal{X}}$ is defined as the average of the Schatten norms of mode-n unfolding matrices, i.e.,

$$\parallel {\mathcal{X}}{\parallel }_{Sp}=\frac{1}{3}\mathop{\sum }\limits_{n=1}^{3}\parallel {{\bf{X}}}_{(n)}{\parallel }_{Sp},$$

(17)

where the $\parallel {{\bf{X}}}_{(n)}{\parallel }_{Sp}={(\sum _{r}{\sigma }_{r}^{p})}^{\frac{1}{p}}$, and the σ_r is the r-th singular value of the unfolding matrix X_(n). For p = 1, the Schatten norm become to the nuclear norm. Thus, the nuclear norm of the original tensor and the core tensor are equal, i.e.,

$$\mathop{\sum }\limits_{n=1}^{3}\parallel {{\bf{X}}}_{(n)}{\parallel }_{* }=\mathop{\sum }\limits_{n=1}^{3}\parallel {{\bf{G}}}_{(n)}{\parallel }_{* }.$$

(18)

We propose to employ block coordinate descent for the optimization. The basic idea of block coordinate descent is to optimize a block of variables while fixing the other groups. We divide the variables into 3 blocks: G₍₁₎, G₍₂₎, G₍₃₎. Using the above properties of the core tensor and the original tensor, we can rewrite Eq. (14) as:

$$\mathop{\min }\limits_{{{\bf{G}}}_{(n)}}\frac{{\gamma }_{t}}{2}\left|\left| {{\bf{G}}}_{(n)}\right|\right| _{* }+\frac{1}{2}\left|\left| {{\bf{G}}}_{(n)}-{\hat{{\bf{G}}}}_{(n)}\right|\right| _{F}^{2},$$

(19)

where the ${\hat{{\bf{G}}}}_{(n)}$ is the mode-n unfolding of ${\hat{{\mathcal{G}}}}_{t-1}$. The above problem has been proven to lead to a closed form in recent papers like^33,34. Let ${\hat{{\bf{G}}}}_{(n)}={\bf{U}}\cdot diag(\{{\sigma }_{i}\})\cdot {{\bf{V}}}^{T}$ be the SVD decomposition, the optimal of G_(n) can be computed as:

$${{\mathfrak{D}}}_{\displaystyle\frac{{\gamma }_{t}}{2}}({\hat{{\bf{G}}}}_{(n)})={\bf{U}}\cdot diag\left(\left\{{\left({\sigma }_{i}-\displaystyle\frac{{\gamma }_{t}}{2}\right)}_{+}\right\}\right)\cdot {{\bf{V}}}^{T},$$

(20)

where the t₊ is the positive part of t, namely, ${t}_{+}=\max (0,t)$. Finally, we calculate the average of the optimal solution on each mode to obtain the solution of Eq. (19):

$${{\mathcal{G}}}_{t-1}^{* }=\frac{\mathop{\sum }\nolimits_{n = 1}^{3}{{\rm{fold}}}_{n}({{\mathfrak{D}}}_{\displaystyle\frac{{\gamma }_{t}}{2}}({\hat{{\bf{G}}}}_{(n)}))}{3}.$$

(21)

Use the fixed factor matrices, we can obtain the target solutions with pixel-wise similarity and global low-rankness simultaneously, which is formulated as:

$${{\mathcal{X}}}_{t-1}^{* }={{\mathcal{G}}}_{t-1}^{* }{\times }_{1}{{\bf{U}}}^{(1)}{\times }_{2}{{\bf{U}}}^{(2)}{\times }_{3}{{\bf{U}}}^{(3)}.$$

(22)

Thus, we can optimize ϵ_ϕ via the following training objective:

$${{\mathcal{L}}}_{\beta }^{t}(\phi )=\mathop{\sum }\limits_{t=1}^{T}{\beta }_{t}{\mathbb{E}}[\parallel {{\mathcal{X}}}_{t}-{({\rm{d}}{{\mathcal{X}}}_{t})}_{{\epsilon }_{\phi }}-{{\mathcal{X}}}_{t-1}^{* }{\parallel }_{{\ell }_{1}}],$$

(23)

where the β_t is positive weight; ${({\rm{d}}{{\mathcal{X}}}_{t})}_{{\epsilon }_{\phi }}$ denotes the reverse SDE in Eq. (5) and its score is predicted via the noise network ${\epsilon }_{\phi }({{\mathcal{X}}}_{t},\mu ,t)$.

Results

Dataset

To verify the effectiveness of the proposed method, we conducted a restoration experiment on a self-made Shaanxi mural dataset, which includes both artificially added random damage and real damaged murals. The dataset comprises high-definition mural images from Shaanxi, featuring various categories such as landscapes and Buddha statues. A total of 351 images of 768 × 768 resolution were collected, with 320 images used for training and the remaining 31 images reserved for testing. The training set was augmented to 2560 images through techniques including horizontal flipping, vertical flipping, and rotation. For evaluation, we employed several metrics to assess the quality of the restored images. Peak Signal-to-Noise Ratio (PSNR) and Structural Similarity (SSIM) are pixel-based metrics that measure the fidelity of restored images compared to the original ones. Additionally the Learned Perceptual Image Patch Similarity (LPIPS) measures perceptual similarity by leveraging deep learning models trained on human judgments, offering a more nuanced assessment of image quality. Fréchet Inception Distance (FID) evaluates the similarity between generated and real images in feature space, using a pre-trained Inception network to extract features and compute distances, thereby providing a metric that reflects higher-level image characteristics beyond pixel-wise differences. This comprehensive evaluation framework ensures a thorough assessment of the proposed method’s performance in restoring damaged murals.

Detail

Randomly crop the size of 256 × 256 pixels in the image as input and the batchsize is set to 8. During training, the AdamW optimizer is employed over 600 epochs with the momentum value of 0.9, the weight decay value of 5e-4, and the initial learning rate of 1e-4. Cosine annealing scheduler is used to gradually decrease the learning rate to 0. The noise-prediction network is constructed by removing group normalization layers and self-attention layers in the U-Net in DDPM⁹ for inference efficiency. We employ vanilla conditional net as the noise network ϵ_ϕ. Data augmentation includes random horizontal and vertical flipping and 90-degree rotation. We set the timestep T = 100 for the diffusion model. All the experiments are implemented with the PyTorch framework and run on 4 NVIDIA 2080TI GPU.

Comparison

To validate the effectiveness of our proposed method and demonstrate the capacity of low-rank guidance to enhance mural restoration performance, we conducted comprehensive experiments on a test set of 31 murals. We applied random masks to these images and compared our approach with state-of-the-art GAN-based and Diffusion-based restoration models and the open-source API of stable diffusion model finetuned with LoRA on our dataset. Tables 2, 3 presents the performance metrics across various methods, illustrating the superior performance of our approach.

Table 2 Comparison of quantitative results on PSNR and SSIM under varied mask ratios

Full size table

Table 3 Comparison of quantitative results on LPIPS and FID under varied mask ratios

Full size table

Our proposed method achieves state-of-the-art performance across three key metrics on the mural dataset. Specifically, we achieved 31.21 dB and 0.952 on the PSNR and SSIM metrics, respectively, both of which quantify detail preservation and structural fidelity. Furthermore, we achieve 11.52 on the FID, indicating excellent performance in preserving the overall image distribution. These results underscore the efficacy of our method in maintaining both fine-grained details and global image characteristics in mural restoration tasks.

Figure 2 presents a comparative analysis of our proposed method against other diffusion-based models, showcasing eight randomly selected images from our mural restoration dataset. The results demonstrate that our approach achieves superior performance in reconstructing detailed textures, yielding more realistic restorations compared to existing methods. Our method exhibits robust performance across various degradation scenarios, including large-area damage, which presents an ill-posed problem, and small-area degradation, which represents a more deterministic restoration problem. In both cases, our approach produces more natural and visually coherent results, effectively addressing the challenges inherent in mural restoration tasks. The visual comparison clearly illustrates the enhanced ability of our method to preserve fine details and maintain overall image integrity, surpassing the capabilities of current state-of-the-art diffusion-based models in the context of mural restoration.

**Fig. 2: Visual comparison of mural restoration results.**

Ablation

Effectiveness of low-rank guidance on performance

To validate the efficacy of low-rank guidance in mural restoration, we conducted experiments using varying numbers of training timesteps as shown in Table 4. Our results demonstrate that tensor rank optimization significantly accelerates the reverse diffusion process. The low-rank guided approach achieves optimal performance at T = 100, outperforming the unguided method. Notably, even at a reduced timestep of T = 80, the low-rank guided reverse diffusion process maintains superior performance compared to its unguided counterpart. These findings underscore the potential of low-rank constraints in enhancing both the efficiency and quality of mural restoration techniques.

Table 4 Quantitative analysis of low-rank guidance

Full size table

Effectiveness of adaptive mechanism on performance

To validate the efficacy of our adaptive mechanism in Eq. (9), we conducted experiments comparing fixed and dynamic weighting strategies, i.e., using Eq. (10). As shown in Table 5, fixed-weight approaches introduce additional hyperparameters that complicate optimization and reduce interpretability, leading to suboptimal performance. Our adaptive mechanism dynamically balances global low-rank constraints for structural coherence and local Frobenius norm minimization for fine details across diffusion stages. Early restoration prioritizes global consistency, while later stages emphasize local fidelity, naturally reflecting mural restoration workflows.

Table 5 Quantitative analysis of adaptive mechanism

Full size table

Low-rankness of unfolding tensor

To validate Eq. (4) and justify the application of Eq. (19) for low-rank optimization, we conducted an empirical analysis. We computed the nuclear norm of the unfolding matrix derived from the core tensor following orthogonal Tucker decomposition. Figure 3 illustrates the evolution of this mode-n unfolding matrix as a function of the diffusion timestep. This visualization provides compelling evidence for the theoretical foundations of our proposed approach and elucidates the temporal dynamics of the low-rank structure during the diffusion process.

**Fig. 3: Low-rankness of core tensor along the n-th mode.**

Effectiveness of Tucker rank {R ₁, R ₂, R ₃} on performance

Our LRDiff model accommodates various rank configurations. To assess the impact of these configurations on restoration performance and computational efficiency, we conducted the quantitative analysis as shown in Table 6, which illustrates the effects of different Tucker rank on both restoration quality and inference time. As the Tucker rank increases, we observe an improvement in restoration quality, albeit at the cost of increased inference time. Notably, when the Tucker rank is configured as {8, 128, 128}, the model achieves optimal performance across multiple metrics: PSNR reaches 31.98 dB, SSIM attains 0.956, and FID decreases to 11.02. This analysis provides valuable insights into the trade-offs between model complexity and performance, enabling optimal parameter selection for specific mural restoration tasks.

Table 6 Quantitative analysis of Tucker rank

Full size table

Discussion

In this work, we proposed a Low-rank structure guided diffusion model to virtually restore the deteriorated regions of the ancient murals. Our approach leverages a diffusion model to capture pixel-wise similarities between low-quality and high-quality image pairs. Exploiting the inherent low-rank prior of murals, we propose a novel low-rank guidance method based on orthogonal Tucker decomposition. This method is seamlessly integrated into the time-dependent function of the diffusion model, enhancing restoration performance while significantly reducing computational complexity from ${\mathcal{O}}({I}^{3})$ to ${\mathcal{O}}({R}^{3})$. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance in mural restoration tasks. The efficacy of the low-rank guidance is rigorously quantified through comprehensive ablation studies. These results not only validate the superiority of our approach but also underscore the potential of incorporating structural priors into diffusion models for mural restoration tasks.

Our proposed method reconstructs complex structures while maintaining global coherence represents a substantial advancement in the field of mural restoration. By integrating low-rank priors and diffusion models, we provide a robust framework capable of addressing the unique challenges posed by severely damaged murals, thus contributing to the preservation of invaluable cultural heritage.

Our method has two main limitations. First, despite the use of orthogonal Tucker decomposition and the solution of Eq. (19), the reliance on the SVD algorithm can lead to the loss of fine-grained texture details, particularly in regions with smaller singular values. Second, the experimental dataset used in this study is relatively limited in scope, as it does not encompass the diverse mural data from different cultures and historical backgrounds worldwide. Therefore, constructing a larger and more comprehensive dataset remains a critical task for future research in this field.

Data availability

The dataset in this study was manually collected by the our team, primarily focusing on mural heritage sites in Yulin City, Hengshan County, Jia County, Shenmu City, and Fugu County in Shaanxi Province, China. Access to the image datasets used to train and test the machine learning model is available at https://github.com/CZY-Code/LRDiff. All the dataset are used under Creative Commons licence.

Code availability

The code is available at https://github.com/CZY-Code/LRDiff. The code is used under Apache-2.0 license.

References

Wang, Y. & Wu, X. Current progress on murals: distribution, conservation and utilization. Heritage Science 11, 61 (2023).
Article PubMed PubMed Central Google Scholar
Kumar, P. & Gupta, V. Preserving artistic heritage: a comprehensive review of virtual restoration methods for damagedartworks. Arch. Comput. Methods Eng. 32, 1199–1227 (2025).
Article Google Scholar
Liu, Y., Li, Q., Yuan, Y., Du, Q. & Wang, Q. Abnet: Adaptive balanced network for multiscale object detection in remote sensing imagery. IEEE Transactions On Geoscience And Remote Sensing 60, 1–14 (2021).
CAS Google Scholar
Liu, Y. et al. Transcending pixels: boosting saliency detection via scene understanding from aerial imagery. IEEE Transac. Geosci. Remote Sens. 61, 1–16 (2023).
Article CAS Google Scholar
Cao, J., Li, Y., Zhang, Q. & Cui, H. Restoration of an ancient temple mural by a local search algorithm of an adaptive sample block. Heritage Science 7, 39 (2019).
Article Google Scholar
Lv, C., Li, Z., Shen, Y., Li, J. & Zheng, J. Separafill: Two generators connected mural image restoration based on generative adversarial network with skip connect. Heritage Science 10, 135 (2022).
Article Google Scholar
Arora, S., Risteski, A. & Zhang, Y. Do GANs learn the distribution? some theory and empirics. In International Conference on Learning Representations (2018).
Ganjdanesh, A., Gao, S., Alipanah, H. & Huang, H. Compressing image-to-image translation GANs using local density structures on their learned manifold. In Proceedings of the AAAI Conference on Artificial Intelligence, 38, 12118–12126 (2024).
Ho, J., Jain, A. & Abbeel, P. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems 33, 6840–6851 (2020).
Google Scholar
Li, S. et al. Single image deraining: A comprehensive benchmark analysis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 3838–3847 (2019).
Zhang, K., Zuo, W., Chen, Y., Meng, D. & Zhang, L. Beyond a gaussian denoiser: Residual learning of deep cnn for image denoising. IEEE Transactions On Image Processing 26, 3142–3155 (2017).
Article PubMed Google Scholar
Zhang, K., Zuo, W., Gu, S. & Zhang, L. Learning deep cnn denoiser prior for image restoration. In Proceedings of the IEEE conference on computer vision and pattern recognition, 3929–3938 (2017).
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. C. Improved training of wasserstein GANs. Adv. Neural Inform. Process. Syst. 30 (2017).
Becker, E., Pandit, P., Rangan, S. & Fletcher, A. K. Instability and local minima in gan training with kernel discriminators. Advances in Neural Information Processing Systems 35, 20300–20312 (2022).
Google Scholar
Rombach, R., Blattmann, A., Lorenz, D., Esser, P. & Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 10684–10695 (2022).
Tong, R. et al. Spectral-domain optical coherence tomography for the non-invasive investigation of the pigment layers of tang dynasty tomb murals exhibited in museums. Optik 199, 163311 (2019).
Article Google Scholar
Xie, Q., Zhao, Q., Xu, Z. & Meng, D. Color and direction-invariant nonlocal self-similarity prior and its application to color image denoising. Science China Information Sciences 63, 1–17 (2020).
Article Google Scholar
Luo, Y., Zhao, X. & Meng, D. Revisiting nonlocal self-similarity from continuous representation. In IEEE Transactions on Pattern Analysis and Machine Intelligence (2024).
Zha, Z. et al. Learning nonlocal sparse and low-rank models for image compressive sensing: Nonlocal sparse and low-rank modeling. IEEE Signal Processing Magazine 40, 32–44 (2023).
Article Google Scholar
Xu, Z., Zhang, C. & Wu, Y. Digital inpainting of mural images based on dc-cyclegan. Heritage Science 11, 169 (2023).
Article Google Scholar
Luo, Y., Zhao, X., Li, Z., Ng, M. K. & Meng, D. Low-rank tensor function representation for multi-dimensional data recovery. IEEE Transactions on Pattern Analysis and Machine Intelligence, 3351−3369 (2023).
Wu, Z.-C. et al. Lrtcfpan: Low-rank tensor completion based framework for pansharpening. IEEE Transactions on Image Processing 32, 1640–1655 (2023).
Article PubMed Google Scholar
Song, J., Meng, C. & Ermon, S. Denoising diffusion implicit models. arXiv preprint arXiv:2010.02502 (2020).
Song, Y., Durkan, C., Murray, I. & Ermon, S. Maximum likelihood training of score-based diffusion models. Advances in Neural Information Processing Systems 34, 1415–1428 (2021).
Google Scholar
Goodfellow, I. et al. Generative adversarial networks. Communications of the ACM 63, 139–144 (2020).
Article Google Scholar
Luo, Z., Gustafsson, F. K., Zhao, Z., Sjölund, J. & Schön, T. B. Image restoration with mean-reverting stochastic differential equations. In Proceedings of the 40th International Conference on Machine Learning, 23045–23066 (2023).
Song, Y. et al. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456 (2020).
Liu, J., Musialski, P., Wonka, P. & Ye, J. Tensor completion for estimating missing values in visual data. IEEE Transactions On Pattern Analysis And Machine Intelligence 35, 208–220 (2012).
Article Google Scholar
Peng, J., Wang, Y., Zhang, H., Wang, J. & Meng, D. Exact decomposition of joint low rankness and local smoothness plus sparse matrices. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 5766–5781 (2022).
Google Scholar
Kolda, T. G. & Bader, B. W. Tensor decompositions and applications. SIAM Review 51, 455–500 (2009).
Article Google Scholar
De Lathauwer, L., De Moor, B. & Vandewalle, J. A multilinear singular value decomposition. SIAM Journal on Matrix Analysis and Applications 21, 1253–1278 (2000).
Article Google Scholar
De Lathauwer, L., De Moor, B. & Vandewalle, J. On the best rank-1 and rank-(r 1, r 2,…, rn) approximation of higher-order tensors. SIAM Journal on Matrix Analysis and Applications 21, 1324–1342 (2000).
Article Google Scholar
Ma, S., Goldfarb, D. & Chen, L. Fixed point and bregman iterative methods for matrix rank minimization. Mathematical Programming 128, 321–353 (2011).
Article Google Scholar
Cai, J.-F., Candès, E. J. & Shen, Z. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization 20, 1956–1982 (2010).
Article Google Scholar
Isola, P., Zhu, J.-Y., Zhou, T. & Efros, A. A. Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, 1125–1134 (2017).
Zhu, J.-Y., Park, T., Isola, P. & Efros, A. A. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Computer Vision (ICCV), 2017 IEEE International Conference on, 2223–2232 (2017).
Kupyn, O., Budzan, V., Mykhailych, M., Mishkin, D. & Matas, J. Deblurgan: Blind motion deblurring using conditional adversarial networks. ArXiv e-prints (2017).
Daras, G., Dean, J., Jalal, A. & Dimakis, A. G. Intermediate layer optimization for inverse problems using deep generative models. arXiv preprint arXiv:2102.07364 (2021).
Akegarasu. Sd-trainer. https://github.com/Akegarasu/lora-scripts (2024).
Lugmayr, A. et al. Repaint: Inpainting using denoising diffusion probabilistic models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11461–11471 (2022).
Liu, H., Wang, Y., Qian, B., Wang, M. & Rui, Y. Structure matters: Tackling the semantic discrepancy in diffusion models for image inpainting. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 8038–8047 (2024).

Download references

Acknowledgements

This research was supported by Social Science Foundation of Shaanxi Province, China (2022J026).

Author information

Authors and Affiliations

School of New Media Art, Xi’an Polytechnic University, No.19 Jinhua South Road, Beilin District, Xi’an, 710048, China
Tong Lei & Yixuan Li
School of Faculty of Humanities and Arts, Macau University of Technology and Science, Avenida Wai Long, Taipa, Macau, 999078, China
Tong Lei & Wen Zhou
School of Electronics and Information, Northwestern Polytechnical University, NO.1 Dongxiang Road, Chang’an District, Xi’an, 710129, China
Zhengyun Cheng
School of Arts and Media, Beijing Normal University, Furen Road, Beitaipingzhuang, Haidian District, Beijing, 100088, China
Wen Zhou

Authors

Tong Lei
View author publications
Search author on:PubMed Google Scholar
Yixuan Li
View author publications
Search author on:PubMed Google Scholar
Zhengyun Cheng
View author publications
Search author on:PubMed Google Scholar
Wen Zhou
View author publications
Search author on:PubMed Google Scholar

Contributions

Tong Lei is responsible for conception, methodology, experimental design, data augmentation, manuscript writing and review and editing. Yixuan Li is responsible for data collection, data set construction strategy, data augmentation and experimental design. Zhengyun Cheng is responsible for software implementation, mathematical derivation and verification. Wen Zhou is responsible for supervision, project conception, overall framework guidance. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Wen Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lei, T., Li, Y., Cheng, Z. et al. Low-rank structure guided diffusion for Shaanxi temple mural restoration. npj Herit. Sci. 13, 295 (2025). https://doi.org/10.1038/s40494-025-01827-3

Download citation

Received: 16 October 2024
Accepted: 26 May 2025
Published: 25 June 2025
Version of record: 25 June 2025
DOI: https://doi.org/10.1038/s40494-025-01827-3

This article is cited by

All-in-one mural restoration with prompt-guided residual diffusion
- Chao Jiang
- Tiantian Ren
- Zhengyun Cheng
npj Heritage Science (2025)

Abstract

Similar content being viewed by others

Mural inpainting via two-stage generative adversarial network

All-in-one mural restoration with prompt-guided residual diffusion

Diffusion-based heterogeneous network for ancient mural restoration

Introduction

Methods

Denoising diffusion probabilistic models with increasing rank

Forward diffusion process

Reverse low-rank denoising process

Time-dependent low-rank structure guidance

Tucker rank minimization for mural

Results

Dataset

Detail

Comparison

Ablation

Effectiveness of low-rank guidance on performance

Effectiveness of adaptive mechanism on performance

Low-rankness of unfolding tensor

Effectiveness of Tucker rank {R 1, R 2, R 3} on performance

Discussion

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

All-in-one mural restoration with prompt-guided residual diffusion

Search

Quick links

Effectiveness of Tucker rank {R ₁, R ₂, R ₃} on performance