Highly accurate mural sketch generation for Dunhuang murals with enhanced patch-GAN

Han, Peize; An, Huili; Cao, Chengrui; Zhou, Yanlin; Bo, Longwei; Yan, Sheng; Yu, Tianxiu

doi:10.1038/s40494-025-02048-4

Download PDF

Article
Open access
Published: 25 September 2025

Highly accurate mural sketch generation for Dunhuang murals with enhanced patch-GAN

Peize Han¹,
Huili An²,
Chengrui Cao³,
Yanlin Zhou²,
Longwei Bo²,
Sheng Yan³ &
…
Tianxiu Yu²

npj Heritage Science volume 13, Article number: 470 (2025) Cite this article

377 Accesses
Metrics details

Abstract

Dunhuang murals are historical treasures, which have been eroded by various deteriorations during thousands of years. As a form of conservation, mural sketch drawing is widely used. However, the drawing is done manually by highly skilled experts, requiring high level of artistic appreciation and is time-consuming. A variety of algorithms for automatic mural sketch extraction were proposed. These algorithms either fail to filter the mural deteriorations effectively or barely obtain bold mural lines which are not enough for practical use. Motivated by these observations, we propose a novel deep generative model with edge guidance, stronger backbone, scaled cross-entropy and stronger supervision from multi-scale PatchGAN. With these simple yet effective enhancements, our proposed model can not only filter mural deterioration, but also generate thin lines, making this model a practical high-accuracy one. Experiments on the Dunhuang mural dataset and generalization study on non-Dunhuang murals demonstrate the effectiveness of our method.

Automatic generation of Chinese mural line drawings via enhanced edge detection and CycleGAN-based denoising

Article Open access 16 July 2025

Supporting historic mural image inpainting by using coordinate attention aggregated transformations with U-Net-based discriminator

Article Open access 28 June 2025

Mural inpainting via two-stage generative adversarial network

Article Open access 19 May 2025

Introduction

The purpose of automatic mural sketch extraction is to convert mural directly into sketch, which is of great significance in the conservation of murals. The main challenges are twofold: (1) filter out a wide variety of deteriorations while extracting meaningful lines; (2) keep consistent with experts’ drawing style.

Not many efforts are made in academic field to overcome these two challenges due to the difficulties of direct extraction of mural sketch. Early mural sketch extraction algorithms, are usually based on Canny¹ edge detectors combined with smoothing or other traditional algorithms. In FSG², a mural line drawing extraction system based on Canny edge detection is proposed. The double-line problem is ignored in this method. To solve the double-line problem, Mural2sketch (M2S)³ adopts a heuristic routing algorithm, using the Canny edge detector to detect the stroke contour, and then high-frequency enhanced filter is used to extract the internal information of the strokes. Traditional algorithms usually involve many adjustable parameters. In order to draw accurate sketches, experts have to adjust hyper-parameter frequently. In many cases, these methods failed to extract meaningful sketches even after specific or complex hyper-parameter settings. At the same time, the nature of responding at the position where the gradient changes drastically in the image makes detectors such as Sobel⁴ and Canny¹ unable to distinguish lines from scratches, resulting in the inability to distinguish between deteriorated and non-deteriorated areas. L0 smoothing⁵ and other smoothing techniques^5,6 can only filter out minor mural deteriorations such as stain and narrow scratches, but they cannot effectively filter out obvious ones. To sum up, these methods fail to solve the double-line problem and filter out deteriorations, thus failing to convey the aesthetic expression of the artist.

The development of neural networks especially edge related convolutional neural networks^7,8,9,10,11 provides new possible solutions for the problems of mural sketch extraction. Learning-based line extraction methods^12,13,14 are more or less inspired by edge detection methods^7,9. Based on HED⁹, the algorithm GSC¹⁴ shows stronger deterioration filtering ability than traditional methods by enhancing the feature fusion process of HED⁹. However, the result is still unacceptable compared to the experts’ strokes. Specifically, the extracted sketch lines are noisy and twined (lack of separation). The subsequent algorithms^12,13 are mostly variants of the edge detection algorithms, such as BDN¹² which is based on DexiNed⁷. BDN takes DexiNed, which is a more advanced edge detection algorithm, as backbone and optimize the fusion part of the model to enhance the mural extraction output. All these algorithms mentioned above are ineffective to prevent the extracted sketches from deteriorations.

Then a question arose: What is the main cause for the ineffectiveness of the forementioned methods^12,13,14? The reason is that these methods model the mural sketch extraction task as an application of edge detection, but in fact this task is not an edge detection problem. We find that in the designs of these methods, input features are encoded to be high-level features but not fully reconstructed by operations such as sequential convolutions, which leads to inadequate deterioration filtering and noisy lines. Specifically, the modifications of these mural sketch extraction methods are adaptive fusion or adding conv-blocks before fusion. The mural sketch extraction results are better than original edge detection algorithms they based. However, compared to the encoding stage where features are extracted by strong feature extractor like VGG¹⁵ or Inception¹⁶, the reconstruction stage of these methods are too weak for it is done through simple fusion or modified fusion. Therefore, it is appropriate to refer to these methods as mural sketch detectors that depict edge-like pixels without filtering deteriorations out.

If we treat mural sketch extraction as a segmentation task, we find that most of the algorithms cannot be used directly. Most segmentation algorithms like DeepLab V1¹⁷, DeepLab V2¹⁸, DeepLab V3+¹⁹, HRNet²⁰, and Panoptic FPN²¹ set the maximum reconstruction resolution at 1/4 or 1/8 of the original image size. The convolutional results of these segmentation models are then upsampled by bilinear. This makes these algorithms impossible to get high-accuracy mural sketch extraction performance. Due to the insufficient receptive field and the lack of skip connections, a typical AutoEncoder is also not the preferred alternative for mural sketch extraction. As a kind of medical segmentation model design, UNet²² retains enough convolutional layers at reconstruction stage, making it strong enough to build finer mural lines.

Inspired by the analysis above, the idea of treating mural sketch extraction as edge detection is rarely used. We propose to exploit U²Net²³ as our backbone and model mural sketch extraction as a generation problem. Firstly, as a variant of UNet, U²Net²³ has a two-level nested U structure, which greatly increases the receptive field and network depth. The larger receptive fields and the deeper network depth enable the model to distinguish between noisy mural background and lines that are hard to distinguish by untrained individuals. The retention of skip-links commonly used by UNet makes the transmission of information in U²Net smoother. Secondly, the sketch line corresponding to a bold line in a mural image can be drawn as a single line or contours around it. For an untrained individual, it’s difficult to determine the lines. While experts are able to analyze the whole picture and decide how to draw the sketch. Therefore, mural sketch extraction is not a task of edge detection that faithfully depict contours. The network needs to learn the habits and experience of experts. Meanwhile, mural deterioration, such as scratches may be difficult to distinguish from lines, which requires a deeper understanding of the semantics in the mural image. Generative adversarial networks (GANs)²⁴ are proficient at modeling the mapping between different domains and are widely used in image synthesis^{25,26,27,28,29}, image captioning³⁰, 3D modeling³¹, video applications^32,33, and image inpainting^8,34,35,36. The first intersection of GANs and Edge detection was in an image inpainting method called Edge-Connect⁸, where the generator network completes incomplete edges. In Edge-connect, GANs have been demonstrated to hallucinate the missing content of incomplete images or the missing lines of incomplete sketches. GANs with additional priors are conditional GANs (CGAN)³⁷. Pix2Pix³⁸, Pix2PixHD³⁹ and SPADE⁴⁰ generate images using segmentation maps, edge maps, or other priors as a guide. At the same time, the generation process of mural sketch is simplified under the guidance of edges, and the direct generation process of mural sketch is transformed into the optimization of edges.

Inspired by Edge-Connect and CGAN, we propose to exploit GANs to provide a deeper insight of murals and use edges as guidance to improve recall. We additionally propose a novel scaled-crossed entropy loss to make the extracted lines thinner.

Experimental results on Dunhuang murals⁴¹ demonstrate that our proposed methods can produce accurate mural sketch results. Example results are shown in Fig. 1. Our contributions are summarized below:

We propose an edge guided generative model to generate mural sketches, which not only improves the sketch generation recall but also keeps the extracted lines from mural deterioration.
We propose a novel scaled cross-entropy loss to make extracted lines thinner so as to fit with expert cognition.
Our proposed mural sketch generation system achieves higher accuracy of generated results than previous state of the arts on Dunhuang mural dataset.

**Fig. 1: Examples of generated sketch with our method.**

Methods

In this section, we describe our approach from bottom to top. We first introduce our proposed mural sketch generation framework and then present our supervision strategies. As shown in Fig. 2, our proposed framework consists of an edge guided generator and a multiscale PatchGAN discriminator. As for training supervision, we propose a new loss function call Scaled-Cross Entropy which reweight and rescale the reconstruction loss commonly used in methods like U2Net²³, HED⁹ and DSS⁴² to make mural sketch generated more slim.

Edge guided generator

The mural sketch generator G follows the architecture in U²Net which is designed as nested U-Net. Mural and the mural edges extracted by Canny are concatenated as input of generator G. The low threshold and high threshold of Canny operator are 100 and 255 in our experiment. The mural sketch generator G is a two-level nested structure in which every resolution stage is a simplified U-Net followed by max pooling. Hence, generator G can effectively extract multi-scale intra-stage features of each resolution stage and inter-stage features among each resolution stages.

As illustrated in Fig. 2, the generator mainly consists of two parts: encoder and decoder. The first part is encoder consists of six U-Block named from E1 to E6. In stage E1, E2, E3 and E4 the block used are standard U-Block. With multiple down-sampling process, the resolution in E5 and E6 is relatively low, max pooling in Nested U Block make nonsense. So, the max pooling and upsampling process in Nested U Block E5 and E6 are replaced with dilated convolutions. The decoder stages are symmetrical to encoders. Like E5 and E6, the module in D1 is a dilated version of U-Block because it is executed in relatively low resolution. Six side output are generated from stages E6, D1, D2, D3, D4 and D5 by a 3 × 3 convolution layer, up-sample by factors of 32, 16, 8, 4, 2, 1 and sigmoid function. The features before sigmoid of six sides are concatenated and convolved with kernel of 1 × 1 to get final output. All these side output and final output offer supervision for sketch generator.

In this paper, we provide two instances of our generator: a standard version and a lite version. The channel number in each U-Bloch of lite version is lower than the standard version and the U-Block in En1 and De1 in standard version are replaced by normal convolutional layers.

Multi-scale PatchGAN discriminator

For previous mural sketch extraction methods which try to extract every sketch line direct from a single network, supervision like cross entropy loss is used on the side outputs. However, we consider the task of sketch generation where there maybe multiple lines are actually scratches. Motivated by pix2pix³⁸, SPADE⁴⁰ and SE-GAN³⁶, we proposed to exploit multi-scale PatchGAN discriminator to offer auxiliary supervision.

As shown in Fig. 2, three parallel fully convolutional networks with different number of layers are used as the discriminator, where the input is generated sketch from generator G, and the outputs is three 2-D feature of shape R^h1*^w1, R^h2*^w2 and R^h3*w3 (h¹^–³, w^1–3 representing the height, width of each PatchGAN discriminator output, respectively). Specifically, as shown in Fig. 2, six convolution layers with kernel size 3 and stride 2 is stacked in Dis1 to capture the feature statistics of input patches. The Dis2 is 1 layer shorter than Dis1 with only 5 layers. The less layer, the smaller receptive field. So, the captured feature statistics of Dis2 focus on smaller region (quarter of Dis1) of side output. The Dis3 is 2 layers less than Dis1, so as to focus on regions with area of quarter of Dis2. Note that the receptive field of outputs in each PatchGAN discriminator are overlapped and densely supervise the mural sketch generation process, pushing the generator to generate finer details and filter out deteriorations in mural image. At the same time, the combination of large and relatively small receptive fields offered by these three discriminators allow for higher resolution training. To discriminate if the input is real or fake, we use hinge loss as the objective function for generator and discriminator, which can be expressed as

$${L}_{{\rm{G}}}=-{E}_{{\rm{z}} \sim {P}_{{\rm{z}}}\left({\rm{z}}\right)}\left[{D}^{1}\left(G\left(z\bigoplus e\right)\right)\right]-{E}_{{\rm{z}} \sim {P}_{{\rm{z}}}\left({\rm{z}}\right)}\left[{D}^{2}\left(G\left(z\bigoplus e\right)\right)\right]-{E}_{{\rm{z}} \sim {P}_{{\rm{z}}}\left({\rm{z}}\right)}\left[{D}^{3}\left(G\left(z\bigoplus e\right)\right)\right]$$

(1)

$$\begin{array}{l}{{L}_{{\rm{D}}}=E}_{{\rm{y}} \sim {P}_{\mathrm{data}}\left(y\right)}\left[ReLU\left(1-{D}^{1}\left(y\right)\right)+{E}_{{\rm{z}} \sim {P}_{{\rm{z}}}\left({\rm{z}}\right)}\left({ReLU(1+D}^{1}\left(G\left(z\bigoplus e\right)\right)\right)\right]\\ +\,{E}_{{\rm{y}} \sim {P}_{\mathrm{data}}\left({\rm{y}}\right)}\left[ReLU\left(1-{D}^{2}\left(y\right)\right)+\,{E}_{{\rm{z}} \sim {P}_{{\rm{z}}}\left({\rm{z}}\right)}\left({ReLU(1+D}^{2}\left(G\left(z\bigoplus e\right)\right)\right)\right]\\ +\,{E}_{{\rm{y}} \sim {P}_{{data}}\left({\rm{y}}\right)}\left[ReLU\left(1-{D}^{3}\left(y\right)\right)+{E}_{{\rm{z}} \sim {P}_{{\rm{z}}}\left({\rm{z}}\right)}\left({ReLU(1+D}^{3}\left(G\left(z\bigoplus e\right)\right)\right)\right]\end{array}$$

(2)

where D^1–3 represents PatchGAN discriminator, G is mural sketch generation network that takes mural image z and edge e as input. $\oplus$ is concatenation in channel dimension. y is ground truth sketch image.

Scaled cross entropy

The sketch and non-sketch imbalance problem in the mural sketch generation task is essentially the same as the edge and non-edge imbalance problem in the edge detection task, which causes the extracted lines to be blurred (verified in 4.3). We use weighted cross-entropy loss to solve this imbalance problem:

$${L}_{\mathrm{ce}}=-\mathop{\sum }\limits_{{\rm{i}},{\rm{j}}}\left[\alpha \left(Y\left(i,j\right)\right)\left(1-P\left(i,j\right)\right){\text{log}}\left(P\left(i,j\right)\right)\right]+(1-\alpha )(1-Y(i,j))(P(i,j)){\text{log}}(1-P(i,j))]$$

(3)

where Y(i,j) is the ground truth label of the pixel (i,j) and P(i,j) is the predicted probability of mural sketch at pixel(i,j). P = G(z⨁e). $\alpha =\frac{\left|{s}^{-}\right|}{\left|{s}^{+}\right|+\left|{s}^{-}\right|}$ and $1-\alpha =\frac{\left|{s}^{+}\right|}{\left|{s}^{+}\right|+\left|{s}^{-}\right|}$. ($\left|{s}^{-}\right|$ and $\left|{s}^{+}\right|$ are the sketch and non-sketch, respectively.). By using the weighted cross-entropy loss, the line drawing extraction results can be significantly improved, but there are still obvious line sticking in the areas with dense details. To alleviate this issue, we propose a scaled version of weighted cross-entropy:

$${L}_{\mathrm{ce}}=-\mathop{\sum }\limits_{{\rm{i}},{\rm{j}}}\begin{array}{l}\left[\alpha \left(S\left(Y\left(i,j\right)\right)\right)\left(1-S\left(P\left(i,j\right)\right)\right){\text{log}}\left(S\left(P\left(i,j\right)\right)\right)\right]\\ +(1-\alpha )(1-S(Y\left(i,j\right)))(S(P\left(i,j\right))){\text{log}}(1-S(P\left(i,j\right)))]\end{array}$$

(4)

where S is bilinear up-sample with factor 8. Then the total loss of mural sketch generator is

$${L}_{{G}_{\mathrm{total}}}=\,{\beta L}_{\mathrm{ce}}+\gamma {L}_{{\rm{G}}}$$

(5)

Where the loss weight $\beta$ and $\gamma$ are hyperparameters balancing loss term ${L}_{\mathrm{ce}}$ and ${L}_{{\rm{G}}}$. In our experiment β and γ are 1 and 0.1.

Results

Datasets and experiment setting

We train and evaluate our proposed mural sketch generation model on Dunhuang mural-sketch dataset. The Dunhuang mural-sketch dataset is collected and labeled by my colleagues with the help of mural experts. Specifically, we labeled 100 Dunhuang murals in which 80 mural-sketch pairs are divided as train set and 20 mural-sketch pairs are divided as test set. Our proposed models are implemented on Pytorch v1.8.1, CUDA v11.1, and run on hardware with CPU Intel(R) Core(R) CPU i7-10700 (2.90 GHz) and GPU RTX3090. We instantiate two models: a standard model and lite model. The standard version has 178 M of parameters, while the lite version has only 34 M. The different generator configs of standard and lite version of our method are showing in Table 1.

Table 1 Network config of our standard and lite model

Full size table

Comparisons

We compare our model with previous state-of-the art methods M2S³, GSC¹⁴, and BDN¹². Figure 3 shows mural extraction results of several representative murals. As shown in the Fig. 3, both the standard (Ours(S)) and the lite (Ours(L)) version of our proposed model achieve much better line drawing extraction results than other two methods. Specifically, with generative modeling and stronger backbone, our models achieve the cleanest background deterioration filtration. Scaled cross-entropy further improves the separation between sketch line strokes. Specifically, through hyper-parameter tuning, traditional algorithm M2S can also obtain relatively cleaner line drawings than the deep learning algorithms GSC and DBN. However, M2S performs poorly in terms of line continuity. GSC and DBN have the problems of thick lines and poor noise filtering. These problems lead to severe adhesion of local lines as well as black blocky artifacts. With the stronger backbone and the proposed losses, we obtain thinner and cleaner lines.

**Fig. 3: Qualitative comparison of mural sketch extraction results.**

Mural sketch extraction lacks good quantitative evaluation metrics. Nevertheless, we report our evaluation in terms of mean cross entropy error, peak signal-to-noise ratio (PSNR), structural similarity (SSIM) and root mean square error (RMSE) on validation set of Dunhuang mural dataset just for reference in Table 2. As can be seen in Table 2, our proposed method achieves better results in terms of Cross Entropy error, PSNR, SSIM, and RMSE. For the PSNR and SSIM, the higher the better. For the Cross Entropy error and RMSE, the lower the better.

Table 2 Quantitative evaluation on Dunhuang mural dataset

Full size table

Ablation study

We investigate the effectiveness of our proposed Scaled Cross Entropy loss comparing to other commonly used supervision of previous method. We trained three exact the same model, except for the cross entropy loss they equipped. As shown in Fig. 4, for every test image, vanilla cross entropy failed to offer enough supervision for mural extractor. As a result, lines extracted are blur and light. As the name implies, weighted cross entropy is an idea of reweighting vanilla cross entropy by black and white pixel distribution of result sketch. Results get much better when weighted cross entropy is used. However, the problem of the bold sketch line issue in these results is still obvious. Figure 4d shows the direct output result under the supervision of our proposed scaled cross-entropy. It can be seen that the bold line problem has been significantly alleviated. Less bold lines mean more lines extracted are separable (not stick together) and can be directly used by users. That is why our proposed method can generate sketch of thin hair strands as shown in Fig. 3.

**Fig. 4: Qualitative comparison of mural sketch extraction results.**

As shown in Fig. 5, our mural sketch generation framework benefits greatly from edge guidance, which is validated by higher recall brought by guidance. Compared to the model trained without guidance, the guided scheme reconstructs more details, as shown in Fig. 5.

**Fig. 5: Qualitative comparison of mural sketch extraction results with and without guide.**

As shown in Fig. 6, we also performed testing to verify the effectiveness of adversarial losses (adv loss). To draw conclusions, we train a line mural sketch extraction model without adversarial loss. Our conclusion is that adversarial loss reduces recall slightly but significantly improves the cleanliness of the results. This makes sense because false positive line examples are laborious to clear up, while slightly reduced recall can be compensated for by simple strokes by human.

**Fig. 6: Qualitative comparison of mural sketch extraction results with and without adv loss.**

Generalization study

We verified the generalization ability of our proposed algorithm on murals at other heritage sites, such as tomb, grotto, and temple. The weather at Dunhuang is always dry and seldom rain, which is very conducive to the preserve of murals. And thus most murals can keep relatively clean if there are no disturbs from human being. Compared with Dunhuang, the murals in Tang Dynasty tomb and other grottoes or tombs are difficult to preserve because of the environment. Deteriorations like salt efflorescence, plaster detachment and mildewing are very common on grotto murals and tomb murals. At the same time, the brush strokes of Dunhuang murals are usually thin, and color is often used to outline objects, while the most outlines of Tang Dynasty tomb murals are drawn with black strokes. All these seem to indicate that the model trained on Dunhuang murals cannot generalize well on tomb or grotto murals with severe deteriorations. Surprisingly, as shown in Fig. 7, our proposed model exhibits better generalization ability than we have expected. We can see that our proposed model filters most of the deteriorations out on these murals and extracts most of the available lines. Of course, the filtering ability of our model for some deteriorations is not perfect. This is due to the differences in style, or the domain gap between Dunhuang murals and the murals from other sites. To narrow the domain gap, we need to enrich the target-style dataset.

**Fig. 7: Generalization results of our method on murals with different style.**

Discussion

In this paper, we propose a novel high-precision mural sketch generation system based on the strong generative backbone U2Net with edge guidance, trained under supervision of scaled cross-entropy loss and multi-scale PatchGAN. We demonstrate that stronger backbone and supervision can significantly improve the mural sketch extraction results, and the original edges extracted by Canny as a priori guidance to the sketch generator can effectively improve the recall of sketches. Quantitative and qualitative comparisons demonstrate the superiority of the proposed method. At the same time, the generalization study demonstrates the potential of the proposed method on extracting sketches from murals with different styles.

Although significant results have been obtained, further improvement in the following aspects are possible:

The ability of line width control should be improved. First, the line width control ability allows experts to draw murals more finely, but current solutions are difficult to achieve single-pixel line drawing extraction results like canny edge extractor. As a result, the fine areas on murals are often reduced in aesthetic when converted to sketch lines by current methods. Second, thin lines imply lower computational cost for inference. However, current methods can only process murals with flourishing patterns by enlarging input image, which is not only time-consuming but also requires huge computational cost.

A controllable modeling for brushstroke rules should be built. For different types of cultural heritage, experts have different understandings of the line drawing style. However, the current methods cannot flexibly adjust the line density and stroke style according to regional semantics or task requirements. Currently, there is no mapping mechanism between structural semantics and stroke rules, which limits the expansion ability of current methods in professional aesthetic control and task customization.

A human-machine collaboration mechanism based on expert feedback should be established. The current model is a one-way generation process, and the feedback closed loop with cultural heritage experts has not been established. In real application scenarios, experts often have stronger image structure perception ability and historical style judgment, and may put forward specific modification opinions or aesthetic preferences for certain areas of the mural sketch. The lack of interactive mechanism makes it difficult to dynamically adjust the generation strategy to meet the requirements of experts, which limits its usability and interpretability in practical cultural heritage conservation work.

A multi-modal auxiliary information fusion method needs to be explored. Current mural sketch extraction methods completely rely on RGB images and edge information for sketch generation, which is difficult to extract mural sketches when murals are faded or blurred. In fact, multi-modal cues such as color residue, texture, material and infrared images, which are often used in the restoration process of murals, are promising to assist in determining the sketch line drawing structure. Current methods have not yet introduced such auxiliary modal information, resulting limited ability to model lines in low-contrast regions or heavily weathered regions.

To address the above challenges, we will carry out in-depth research from the following directions in the future: (1) Introducing a multi-level structure guidance mechanism and a line width adjustment module to improve the adaptability of the model in line generation at different density regions; (2) Constructing a region-controllable brushstroke rendering module by combining structural semantic labels and regional texture features; (3) Design human-machine collaborative feedback mechanism with expert participation, such as reinforcement learning or few shot tuning, to realize personalized rendering guidance; (4) Expand the input modality of the system, and introduce infrared image, material reflection image or historical restoration image as auxiliary information to enhance the line draw restoration ability of blurred areas.

Data availability

The data in this paper are from the Dunhuang mural dataset proposed in the previous research⁴¹, which will be made available upon reasonable request.

References

Canny, J. A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986).
Article Google Scholar
He, J., Wang, S., Zhang, Y. & Zhang, J. A computational fresco sketch generation framework. In Proceeding IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2013).
Sun, D., Zhang, J., Pan, G. & Zhan, R. Mural2Sketch: a combined line drawing generation method for ancient mural painting. In Proceeding IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2018).
Kittler, J. On the accuracy of the Sobel edge detector. Image Vis. Comput. 1, 37–42 (1983).
Article Google Scholar
Xu, L., Lu, C., Xu, Y. & Jia, J. Image smoothing via L₀ gradient minimization. ACM Trans. Graph. 30, 1–12 (2011).
Google Scholar
Li, L., Guo, X., Feng, W. & Zhang, J. Soft clustering guided image smoothing. In Proceeding IEEE International Conference on Multimedia and Expo (ICME) (IEEE, 2018).
Soria, X., Riba, E. & Sappa, A. Dense extreme inception network: towards a robust CNN model for edge detection. In Proceeding IEEE Winter Conference on Applications of Computer Vision (WACV) (IEEE, 2020).
Nazeri, K., Ng, E., Joseph, T., Qureshi, F. & Ebrahimi, M. Edgeconnect: structure guided image inpainting using edge prediction. In Proceeding. IEEE International Conference on Computer Vision Workshops (ICCVW) 1912–1921 (IEEE, 2019).
Xie, S. & Tu, Z. Holistically-nested edge detection. In Proceeding. IEEE International Conference on Computer Vision (ICCV) 1395–1403 (IEEE, 2015).
Liu, Y. et al. Prior semantic-embedding representation learning for on-the-fly FG-SBIR. Expert Syst. Appl. 255, 124532 (2024).
Article Google Scholar
Liu, Y., Dai, D., Wang, G. & Xia, S. Multivariate feedback-based image-text joint learning for sketch-less facial image retrieval. IEEE Trans. Circuits Syst. Video Technol. https://ieeexplore.ieee.org/document/11021462 (2025).
Liu, B., Du, S., Li, J., Wang, J. & Liu, W. Dunhuang mural line drawing based on bi-dexined network and adaptive weight learning. In Proceeding Pattern Recognition and Computer Vision, PRCV 279–292 (Springer, 2022).
Liu, B., He, F., Du, S., Zhang, K. & Wang, J. Dunhuang murals contour generation network based on convolution and self-attention fusion. Appl. Intell. 53, 22073–22085 (2023).
Article Google Scholar
Pan, G., Sun, D., Zhan, R. & Zhang, J. Mural sketch generation via style-aware convolutional neural network. In Proceeding Computer Graphics International 239–245 (ACM, 2018).
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceeding International Conference on Learning Representations (ICLR) (2015).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J. & Wojna, Z. Rethinking the inception architecture for computer vision. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2818–2826 (IEEE, 2016).
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. Semantic image segmentation with deep convolutional nets and fully connected CRFs. Computer Science, 357–361 (2014).
Chen, L. C., Papandreou, G., Kokkinos, I., Murphy, K. & Yuille, A. L. DeepLab: semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected CRFs. IEEE Trans. Pattern Anal. Mach. Intell. 40, 834–848 (2017).
Article PubMed Google Scholar
Chen, L. C., Zhu, Y., Papandreou, G., Schroff, F. & Adam, H. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceeding European Conference on Computer Vision (ECCV) 833–851 (Springer, 2018).
Wang, J. et al. Deep high-resolution representation learning for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2020).
Article Google Scholar
Kirillov, A., Girshick, R., He, K. & Dollár, P. Panoptic feature pyramid networks. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 6392–6401 (IEEE, 2019).
Ronneberger, O., Fischer, P. & Brox, T. U-net: convolutional networks for biomedical image segmentation. In Proceeding Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings, Vol. 9351 (Springer, 2015).
Qin, X. et al. U2-Net: going deeper with nested U-structure for salient object detection. In Proceeding Conference on Computer Vision and Pattern Recognition Vol. 106, 107404 (2020).
Goodfellow, I. et al. Generative adversarial networks. Commun. ACM 63, 139–144 (2020).
Article Google Scholar
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V. & Courville, A. Improved training of Wasserstein GANs. In Proceeding Conference on Neural Information Processing Systems (NIPS) (ACM, 2017).
Karras, T., Aila, T., Laine, S. & Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In Proceeding International Conference on Learning Representations (ICLR) (2018).
Xu, T. et al. AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1316–1324 (IEEE, 2018).
Zhao, J., Mathieu, M. & LeCun, Y. Energy-based generative adversarial network. In Proceeding International Conference on Learning Representations (ICLR) (2017).
Vondrick, C., Pirsiavash, H. & Torralba, A. Generating videos with scene dynamics. In Proceeding Conference on Neural Information Processing Systems (NIPS) (ACM, 2016).
Dai, B., Fidler, S., Urtasun, R. & Lin, D. Towards diverse and natural image descriptions via a conditional GAN. In Proceeding IEEE International Conference on Computer Vision (ICCV) 2989–2998 (IEEE, 2017).
Nguyen-Phuoc, T., Li, C., Theis, L., Richardt, C. & Yang, Y. HoloGAN: unsupervised learning of 3D representations from natural images. In Proceeding IEEE International Conference on Computer Vision (ICCV) 7587–7596 (IEEE, 2019).
Mathieu, M., Couprie, C. & LeCun, Y. Deep multi-scale video prediction beyond mean square error. In Proceeding International Conference on Learning Representations (ICLR) (2016).
Tulyakov, S., Liu, M. Y., Yang, X. & Kautz, J. MoCoGAN: decomposing motion and content for video generation. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 1526–1535 (IEEE, 2018).
Yu, J. et al. Generative image inpainting with contextual attention. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5505–5514 (IEEE, 2018).
Yu, J. et al. Free-form image inpainting with gated convolution. In Proceeding IEEE International Conference on Computer Vision (ICCV) 4470–4479 (IEEE, 2019).
Liu, W. et al. Fine-grained image inpainting with scale-enhanced generative adversarial network. Pattern Recognit. Lett. 143, 81–87 (2021).
Article Google Scholar
Mirza, M. & Osindero, S. Conditional generative adversarial nets. Computer Science, 2672–2680 (2014).
Isola, P., Zhu, J. Y., Zhou, T. & Efros A. Image-to-image translation with conditional adversarial networks. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 5967–5976 (IEEE, 2017).
Wang, T. C. et al. High-resolution image synthesis and semantic manipulation with conditional GANs. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 8798-8807 (IEEE, 2018).
Park, T., Liu, M. Y., Wang, T. C. & Zhu, J. Y. Semantic image synthesis with spatially-adaptive normalization. In Proceeding IEEE Conference on Computer Vision and Pattern Recognition (CVPR) 2332–2341 (IEEE, 2019).
Yu, T. et al. Artificial intelligence for Dunhuang cultural heritage protection: the project and the dataset. Int. J. Comput. Vis. 130, 2646–2673 (2022).
Article Google Scholar
Qin, X. et al. Highly accurate dichotomous image segmentation. In Proceeding European Conference on Computer Vision (ECCV) 38–56 (Springer, 2022).

Download references

Acknowledgements

This research has been supported by Central Guidance for Local Science and Technology Development Funding Project No. 25ZYJF001, Tianjin Science and Technology Program Project Nos. 22YFZCSN00230 and 23YFZCSN00340 from Tianjin Municipal Science and Technology Bureau of China.

Author information

Authors and Affiliations

Beijing University of Chemical Technology, Beijing, China
Peize Han
Dunhuang Academy, Dunhuang, Gansu, China
Huili An, Yanlin Zhou, Longwei Bo & Tianxiu Yu
Lab of Digital Conservation of Cultural Heritage, Tianjin, China
Chengrui Cao & Sheng Yan

Authors

Peize Han
View author publications
Search author on:PubMed Google Scholar
Huili An
View author publications
Search author on:PubMed Google Scholar
Chengrui Cao
View author publications
Search author on:PubMed Google Scholar
Yanlin Zhou
View author publications
Search author on:PubMed Google Scholar
Longwei Bo
View author publications
Search author on:PubMed Google Scholar
Sheng Yan
View author publications
Search author on:PubMed Google Scholar
Tianxiu Yu
View author publications
Search author on:PubMed Google Scholar

Contributions

P.H. and T.Y. generated the core research idea, designed conceptual framework and drafted the paper. H.A., Y.Z., and L.B. prepared the Dunhuang mural dataset and evaluated the performance of the Mural Sketch Generation system. S.Y. and C.C. optimized the algorithms and trained the models. All authors: Reviewed and finalized the manuscript.

Corresponding authors

Correspondence to Peize Han or Tianxiu Yu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Han, P., An, H., Cao, C. et al. Highly accurate mural sketch generation for Dunhuang murals with enhanced patch-GAN. npj Herit. Sci. 13, 470 (2025). https://doi.org/10.1038/s40494-025-02048-4

Download citation

Received: 29 May 2025
Accepted: 14 September 2025
Published: 25 September 2025
DOI: https://doi.org/10.1038/s40494-025-02048-4

Highly accurate mural sketch generation for Dunhuang murals with enhanced patch-GAN

Abstract

Similar content being viewed by others

Automatic generation of Chinese mural line drawings via enhanced edge detection and CycleGAN-based denoising

Supporting historic mural image inpainting by using coordinate attention aggregated transformations with U-Net-based discriminator

Mural inpainting via two-stage generative adversarial network

Introduction