Abstract
Ancient Chinese silk paintings represent a remarkable fusion of art and culture, functioning as key artifacts with historical, present, and future significance. However, inappropriate preservation has resulted in different types of deterioration, including mold infestation, which causes pigment fading, discoloration, structural fragility, and breakdown. This paper proposed the Spectral-Guided Restoration Asymmetric Autoencoder (MoldSGR-AsyAutoencoder) for hyperspectral virtual restoration of mold-affected silk paintings. Through the mold spectral response analysis, the spectral invariant characteristics of mold spots on silk paintings in the near-infrared (NIR) were found. The similarity discrimination strategy based on spectral-spatial features was developed. An asymmetric autoencoder model with multistage feature extraction was then designed to achieve pixel-level hyperspectral virtual recovery of mold-affected regions. Experimental results demonstrate that this method achieved excellent virtual restoration in both simulated and real-mold-affected regions, providing a robust theoretical foundation and technical support for the hyperspectral virtual restoration of mold-affected regions on silk paintings.
Introduction
Ancient Chinese silk paintings represent a unique fusion of art and culture, reflecting immense historical and esthetic value, and connecting the past, present, and future. However, due to improper preservation and other factors, ancient silk paintings are often susceptible to the damaging effects of diseases such as mold. The mold disease usually leads to fading or discoloration, causing structural fragility and decomposition, which not only reduces the esthetic and historical value of cultural relics but also hinders our understanding of past civilizations, technologies, and cultural customs. Traditional restoration of ancient mold-damaged paintings often involves physical and chemical methods, which not only require high-level professional technology but also make it difficult to avoid various risks during the restoration process, which may cause irreversible damage to the paintings.
With the development of digital imaging technology and computer science, electronic restoration provides a more precise and non-destructive approach to the restoration of ancient paintings and cultural relics. By using image analysis technology, researchers can accurately identify the location of diseases and distinguish the original material and pigment of the damaged region. Virtual restoration does not require physical or chemical operations on cultural relics, avoiding damage and reducing the use of chemical agents. It continues the artistic and cultural life of cultural relics and has become a research hotspot in the field of cultural relic restoration.
In the virtual restoration of cultural relics, most research has concentrated on RGB images. Restoration algorithms, based on these RGB images, recalculate the color components by analyzing texture features and other related information to eliminate defects caused by improper preservation or aging, such as stains, cracks, and discoloration1,2,3,4,5,6,7,8. For example, Amiri et al.1 employed an encoder-decoder approach to virtually remove varnish. Kumar et al.2 proposed a Generative Adversarial Network (GAN) based on a pre-trained residual network for restoring artworks. Ge et al.3 introduced Fast Fourier Convolution to enhance color recovery in missing mural regions. Wu et al.4 proposed a GAN with a dual attention module to restore fragmented tomb murals while preserving texture and structural similarity. Kumar et al.5 utilized a dynamic U-Net generator and a GAN with a patch discriminator to restore damaged artwork. Hu et al.9 proposed a super-resolution convolutional GAN to restore Chinese landscape painting. Zeng et al.6 restored RGB images using both a convolutional neural network and an approach based on matching image patches, based on the size of the repair region. Despite the numerous attempts and successes in RGB-based virtual restoration, this approach is primarily effective for visual enhancement and lacks substantial data support for the analysis of hidden information in cultural relics, such as pigment analysis10, and text extraction11.
Hyperspectral technology enables non-destructive analysis of cultural relics, enabling the identification of fundamental attributes, such as color and texture12,13,14,15,16,17, but also providing valuable hidden information, which significantly contributes to the virtual restoration of cultural relics18,19,20,21. Current algorithms for the restoration of mold or stains in calligraphy and painting typically employ methods such as Minimum Noise Fraction (MNF) and Principal Component Analysis (PCA) to analyze hyperspectral images. These techniques remove the components corresponding to mold or stains, followed by an inverse transformation to reconstruct the image. For example, Hou et al.22 first performed MNF transformation on the hyperspectral images, then deleted the bands mainly containing stains, and finally performed inverse MNF transformation on the remaining bands. It can be observed that in current virtual restoration methods, the primary methods, such as MNF and PCA, are primarily used to reduce the dimensionality of hyperspectral images to synthesize RGB images, followed by the application of restoration algorithms for virtual restoration. Another approach involves dimensionality reduction based on MNF or PCA, followed by synthesizing the principal components into an RGB image. Subsequently, virtual restoration algorithms are applied to reconstruct the image. For example, Hou et al.19 conducted a PCA on hyperspectral images within the 450–600 nm range, composing the first three components into RGB images. Then, subsequently used the Criminisi algorithm to repair regions that were affected by mold regions and achieved virtual restoration of the hyperspectral images through the inverse transformation. Qiao al23. proposed using PCA and high-pass filter transformations to reduce the dimensionality of hyperspectral images. Subsequently, denoising and inpainting were performed using morphological transformations and connected component analysis. Sun et al.24 synthesized RGB images through PCA dimensionality reduction and high-pass filtering denoising of hyperspectral images, converted these images into the HSV color space, and subsequently utilized a triplet domain translation network for restoration. However, these techniques generally do not consider the spectral response characteristics of mold or stains, and other irrelevant information may also be present in the principal components associated with mold or stains. Consequently, such limitations often lead to information loss and color distortion, making it challenging to satisfy the requirements for subsequent analysis and extraction of hidden cultural information. Meanwhile, the current research by Wang et al.25 has considered the spectral mechanism of mold in silk paintings and constructed a 3D CNN virtual repair network. However, in terms of sample selection, the model inputs the entire dataset, with 50% of the data used as training samples. There is no adaptive selection process, which introduces a large number of other color samples, causing the problem of sample imbalance. Additionally, the study only virtually restored information from the RGB bands, and did not restore the complete spectrum, which makes it difficult to meet the needs of subsequent hidden information extraction in cultural heritage artifacts.
To restore the full-spectrum of silk paintings affected by mold stains, this study utilized the advantages of asymmetric autoencoder models in deeply mining the multi-level features of samples26 and the spectral response mechanism of mold27. We propose a novel approach named the Mold Spectral-Guided Restoration Asymmetric Autoencoder (MoldSGR-AsyAutoencoder), which offers technical support for the scientific protection and research of cultural relics. The main contributions of this study are as follows.
-
1.
Based on the spectral invariance features of the near-infrared bands of mold spots in silk paintings, a new approach for hyperspectral image restoration for mold-damaged silk paintings is proposed. This method involves the use of a sample adaptive selection strategy and restoration model based on an asymmetric autoencoder network, aiming to deeply explore multi-level features of the samples, maintain spatial and spectral consistency between the repair area and its surrounding regions.
-
2.
A discrimination strategy based on spatial-spectral feature similarity is proposed to support adaptive sample selection. Based on the spectral characteristics of mold spots within the near-infrared bands of silk paintings, this approach effectively identifies undamaged region samples that exhibit spatial-spectral characteristics comparable to those of mold-damaged regions, providing a reliable and high-quality sample for training the autoencoder network.
Methods
A silk painting titled “Shen Qinglan Hua Tie Luo” was selected as the experimental data in this study. The painting was created from 1760 to 1770 AD, depicting a traditional story of “Three Blessings of Hua Feng”28. The painting uses homophones of bamboo, orchid, and longevity to express blessings of wealth, longevity, many descendants, and good fortune.
Due to improper preservation, the painting has developed multiple mold-damaged regions. In this study, six representative regions were selected, focusing on characters, clothing, and background. The distribution of mildew spots varies, ranging from extensive coverage to localized occurrences. The pigments in affected paintings exhibit a spectrum from dark to light hues. Additionally, the complexity of damaged patterns ranges from intricate to simple designs (Fig. 1). Region 1 “Pattern of Garments” shows the clothing of the character, which contained mold spots near the hands, clothes, and branches of the characters (see Fig. 1a); Region 2 depicts the leaves and branches, with a large number of mold spots in the lines of the branches (see Fig. 1b); Region 3 shows a portrait of a lady with mold on her cheeks, eyebrows, and elbows (see Fig. 1c); Region4 shows a portrait of a lady, the characters' forehead, cheek, lips, etc. are all speckled with mold (see Fig. 1d); Region5 shows a portrait of a lady and branches, and the clothes, branches and other places of the characters are all speckled with mold (see Fig. 1e); and in Region 6 “Clothing”, there was a large amount of mold in the folds of the clothing (see Fig. 1f). The presence of mold in these regions significantly impairs the analysis of paint composition, color accuracy, patterns, and other relevant information.
The HS-VN/SW2500CR Heritage spectral imaging system (Fig. 2) was used for data acquisition. A Visible and near-infrared-shortwave infrared (VNIR-SWIR) imaging camera provided by Headwall was used as the detector15. This system incorporates a shared optical path design for both the visible and near-infrared as well as short-wave infrared spectral ranges. The platform’s light source spans the 400–2500 nm range, and it is synchronized with the instrument’s movement to ensure uniform brightness across the imaging region. The camera covers the VNIR range 400–1000 nm with a spectral resolution of 1.6 nm, which has a total of 370 bands. The parameters of the system are outlined in Table 1.
Overall framework
An asymmetric autoencoder network, MoldSGR-AsyAutoencoder, is designed to address the issue of hyperspectral virtual restoration of mold damage on silk paintings. (1) Hyperspectral data analysis was conducted to distinguish the spectral characteristics of the mold-affected and non-mold-affected regions. This is the theoretical foundation for our spectral virtual restoration method. The analysis results demonstrated that the mold-affected regions on silk paintings exhibited spectral invariance in the near-infrared spectrum, i.e., the spectral characteristics of the mold-affected region remained relatively stable in this spectral range, revealing information shielded by the mold. Concurrently, there was a significant correlation between the features of the mold-affected regions and those of the surrounding non-mold-affected regions. (2) According to the first law of geography, geographical phenomena exhibit a strong neighbor effect in space, i.e., the closer the distance between geographical entities, the stronger their similarity and interaction. This principle provides an important theoretical foundation for the subsequent research, enabling the spectral virtual restoration of mold-affected regions based on spatial proximity characteristics and near-infrared spectral characteristics of mold. To address the challenge of searching for similar samples between the mold-affected region and non-mold-affected regions in hyperspectral images, we propose a similarity discrimination strategy based on spectral and spatial features. By comprehensively considering the spectral similarity and spatial texture similarity, this strategy can effectively select samples from non-mold-affected regions with similar spectral characteristics to the mold-affected regions, thus constructing a high-quality sample dataset. This approach not only significantly improves the construction efficiency of the sample dataset, but also enhances the representativeness and practicality of the sample dataset, providing rich and reliable data support for the asymmetric autoencoder model. (3) Based on the constructed sample dataset, an asymmetric autoencoder model was designed. The model introduced a multi-level feature extraction mechanism, which can extract the complex spectral features of hyperspectral images, thus enabling the pixel-by-pixel hyperspectral virtual restoration of mold-affected regions, and the overall framework is shown in Fig. 3.
Mechanism of virtual restoration of mold-damaged paintings
In each region, 300 mold-affected pixels and 300 unaffected pixels were chosen for the average spectral and envelope-removal spectral analyses of spectral characteristics, as illustrated in Fig. 4. It demonstrates a significant difference in the absorption spectra of mold-affected and non-mold-affected regions within the wavelength range of 450–650 nm. In the non-mold-affected regions, spectral valleys typically occurred between 468 and 485 nm, while in the mold-affected regions, spectral valleys occurred between 560 nm and 620 nm. Additionally, it is worth noting that the spectrum tends to peak after 720 nm and remains relatively stable in the 720–942 nm range, both in the mold-affected and non-mold-affected regions. This phenomenon is attributed to the different absorption wavelengths of Aspergillus Niger and Aspergillus flavus, which result from their distinct spatial molecular structures (Fig. 4a). Previous studies have demonstrated that these fungi exhibit almost zero absorption near 720 nm, indicating a very low absorption spectrum in this range29,30. Consequently, the underlying spectral information of mold cover can be effectively revealed in the near-infrared band. These spectral features provide an important physical basis for the detection and restoration of mold-affected pixels25. Considering the spectral curves of Fig. 4a–f comprehensively, the 810 nm is taken as the dividing line, that is, the spectrum from 810 to 974 nm is taken as the basis to restore the spectrum from 425 to 809 nm, as shown in Fig. 3. In addition, the regions affected by mold were extracted by the Random Forests (RF) algorithm. Further details and theoretical foundations regarding RF are provided in31.
Comprehensive similarity discrimination utilizing spectral and spatial metrics
The comprehensive selection strategy framework is illustrated in Fig. 5. First, an initial 5×5 window centered on the mold pixel is selected. Next, in this 5 × 5 window, the spectral-spatial similarity between each non-mold-affected pixel and the mold-affected pixel’s NIR spectrum is calculated, and the most similar pixels are selected and their number recorded. If the number of similar pixels selected exceeds the set threshold (default set as 30, a minimum of 30 samples were selected.), the near-infrared spectrum of these pixels is used as input, and their visible spectrum is used as output for the autoencoder for further processing. If the number of similar pixels selected does not reach the threshold, the window size is gradually expanded (e.g., 7 × 7, 9 × 9, etc.), and the number of similar pixels is recalculated in each new window to verify if the threshold requirements are met. Once enough similar pixels are found in the new window, the spectrum of these pixels is extracted and input into the autoencoder. If the window is expanded to the predetermined maximum size (e.g., reaching an image boundary or a preset maximum window size) and the threshold condition is not met, the detection process is halted, and the current result is recorded, or an exception action is taken. The spectral-spatial similarity is calculated as follows:
Spectral Angle Mapping (SAM) assesses the similarity between two spectral vectors by computing the angle between them32. The fundamental principle involves projecting both the target and reference spectra into a multidimensional space, followed by calculating the angle formed between these vectors. A smaller angle indicates a higher degree of spectral similarity. The calculation formula is as follows:
Where \({x}_{1}\) and \({x}_{2}\) represent two spectral vectors (near-infrared).
Spectral Information Divergence (SID) is an information-theoretic approach that quantifies the dissimilarity between two spectral distributions by evaluating their information entropy33. This method treats spectral vectors as random variables and utilizes probabilistic statistical theory to assess the similarity of two such vectors. Notably, a lower SID value indicates a higher degree of similarity between the two spectra. The formula for the calculation is as follows:
Assuming there are two spectral vectors \(R=\left[{r}_{1},{r}_{2},\ldots ,{r}_{n}\right]\) and \(T=\left[{t}_{1},{t}_{2},\ldots ,{t}_{n}\right]\), which represent the reference spectrum and the target spectrum, respectively.
By integrating the SID_SAM synthesis method of SAM and SID and considering the differences in spectral shape and reflection energy, the similarity between spectra can be calculated more effectively. Specifically, a smaller value indicates a higher degree of similarity between spectra. The formula for the calculation is as follows:
Where \({SID}(R,T)\) and \({SAM}(R,T)\) represents the SID and SAM values of spectral vectors \(R\) and \(T\), respectively. \(\sin\) denotes the sine value.
The variance indicates the degree of data fluctuation around the mean. In this study, the variance of the difference between the target spectrum and the reference spectrum was calculated. The smaller the variance, the closer the pixel spectrum is to the reference spectrum. Assuming the reference spectrum vector \(R=\left[{r}_{1},{r}_{2},\ldots ,{r}_{n}\right]\), and the target spectrum vector \(T=\left[{t}_{1},{t}_{2},\ldots ,{t}_{n}\right]\), The formula for the calculation is as follows:
Where \(\left|{r}_{n}-{t}_{n}\right|\) represents the absolute value of the difference between \({r}_{n}\) and \({t}_{n}\).
Simultaneously, samples were selected from each category to evaluate the separability of the near-infrared band for each category. The Jeffries-Matusita (JM) Distance is a measure used to quantify the separability between two probability distributions. Here, the JM distance is used to evaluate the separability of the individual bands in NIR. Assuming the samples of class1 and class2 is \(s1\), and \(s2\), then the formula for the calculation is as follows:
Where, \({JM}{(s1,s2)}_{{bandi}}\) denotes the JM distance of \(s1\), and \(s2\) in band i, B denotes the Bhattacharyya distance, which is calculated by the following formula:
Where, B denotes the Bhattacharyya distance of \(s1\), and \(s2\) in band i, \({m}_{1}\) and \({m}_{2}\) represent the mean value of sample \(s1\) and sample \(s2\) in band i, respectively. \({\delta }_{1}\) and \({\delta }_{2}\) represent the variance value of sample \(s1\) and sample \(s2\) in band i, respectively.
The band exhibiting the highest separability is identified as the reference band. Concurrently, the absolute value of the difference between the reference pixel and the target pixel at the reference band was calculated. The lower the value is, the closer the reference pixel and the target pixel are. Through this strategy, high-quality similar sample datasets were selected.
Virtual restoration of autoencoder
A deep autoencoder is an unsupervised learning framework consisting of an encoder and a decoder. The primary objective is to learn a low-dimensional representation (encoding) from high-dimensional input data, which can then be used to reconstruct the original input or transform it into another form26. In this study, an asymmetric autoencoder is designed to address the inherent inconsistency between the input NIR spectral bands and the output visible spectra (as shown in Fig. 6).
The asymmetric autoencoder comprises two main components: an encoder and a decoder. The encoder is tasked with compressing high-dimensional NIR data into a low-dimensional latent representation. This is achieved through a four-layer fully connected network, with ReLU activation functions employed at each layer to facilitate the extraction of nonlinear features. The decoder, on the other hand, reconstructs the low-dimensional representation back into the visible spectrum using another four-layer fully connected network. The final output layer of the decoder utilizes a linear activation function to generate the reconstructed visible spectral data. To optimize the model, the Adam optimizer is employed, and the training process is guided by the Mean Squared Error (MSE) loss function. This approach ensures effective learning of the complex mapping between NIR and visible spectra, thereby enhancing the model’s ability to perform accurate spectral reconstruction.
If the NIR data \(x\in {R}^{{dnir}}\), where \({dnir}\) represents the dimension of the NIR spectrum, and the potential space \(z\in {R}^{{dz}}\), where \({dz}\) represents the dimension of the potential space, then the encoder is expressed as follows.
Where \({fencoder}\) is a neural network defined by \({\theta }_{e}\). Three hidden layers and one encoding layer were designed in this study. The first, second, and third hidden layer is expressed as
Where, \({W}_{1}\in {R}^{256\times {dnir}}\), \({b}_{1}\in {R}^{256}\), \({W}_{2}\in {R}^{256\times 256}\), \({b}_{2}\in {R}^{256}\), \({W}_{3}\in {R}^{128\times 256}\), \({b}_{3}\in {R}^{128}\)
And the latent representation layer is as follows.
Where, \({W}_{4}\in {R}^{64\times 128}\), \({b}_{4}\in {R}^{64}\)
Through the above steps, the near-infrared spectral data are compressed into a 64-dimensional latent space. The task of the decoder is to decode the latent representation Z to the visible spectrum \(\hat{y}\in {R}^{{dvis}}\), where \({dvis}\) represents the dimension of the visible spectrum. Then the decoder is expressed as follows.
Where \({fecoder}\) is a neural network defined by \({\theta }_{d}\). Three hidden layers and one output layer were designed in this study. The first, second, and third hidden layer is expressed as
Where, \({W}_{5}\in {R}^{64\times 64}\), \({b}_{5}\in {R}^{64}\), \({W}_{6}\in {R}^{128\times 64}\), \({b}_{6}\in {R}^{128}\), \({W}_{7}\in {R}^{256\times 128}\), \({b}_{7}\in {R}^{256}\). And the decoding layer is defined as
Where, \({W}_{8}\in {R}^{{dvis}\times 256}\), \({b}_{8}\in {R}^{{dvis}}\).
In the training process, root mean square error (RMSE) is used as the loss function:
Where, \(\hat{y}\) denotes the value of the reconstructed visible spectrum and \(y\) denotes the value of the true spectrum.
During the training process, the gradients of the loss function with respect to each parameter are calculated using the backpropagation algorithm, and the Adam optimizer is used to update the parameters to minimize the loss. Through the aforementioned process, the autoencoder designed in this study learns to compress high-dimensional NIR spectral data into a low-dimensional latent space and reconstruct visible spectral data from it.
The computational process is performed on a Legion Y7000P 2019 PG0 computer equipped with an NVIDIA GeForce GTX 1650 graphics card and Intel(R) Core(TM) i7-9750H CPU, and with software such as TensorFlow 2.10.0 package in Anaconda 2.6.3 (Python 3.9.0) and ENVI 5.3.
Results
Results of virtual restoration in simulated mold-affected regions
To assess the effectiveness of the MoldSGR-AsyAutoencoder model in virtual restoration of hyperspectral images, two representative mold-free regions were chosen for simulation experiments. Different-sized simulated mold stains were inserted into real hyperspectral images, and the images were restored using the MoldSGR-AsyAutoencoder model. The performance of the algorithm was evaluated based on the restoration accuracy. Specifically, two regions were selected for this simulation: Simulated data 1, which represents character clothing (Fig. 7), and Simulated data 2, which captures background plant communities (Fig. 8). These regions, characterized by their relatively complex textures and colors, provided a robust test for evaluating the virtual restoration performance of the algorithm.
In accordance with the experimental design, 10 specific locations within the character clothing regions were selected for mold information insertion, distributed on the hands, clothing decorations, patterns, and lines, as shown in Fig. 7. Similarly, 6 specific locations were chosen within the background flora region, encompassing leaf veins, petals, stems, and roots, as illustrated in Fig. 8. In Figs. 7 and 8, panel (a) displays the simulated mold-affected regions, panel (b) presents the original data, and panel (c) shows the virtual restoration results.
The comparison between the original and the virtual restoration spectra is shown in Fig. 9. Specifically, Fig. 9a displays the spectral comparisons for regions #1–#5 of the character’s clothing in simulated data 1, while Fig. 9b shows regions #6–#10 of the same data. Figure 9c, d compare the background floral regions #1–#3 and #4–#6 of simulated data 2, respectively. In these figures, the solid line represents the original spectrum, and the dashed line represents the restored spectrum using the MoldSGR-AsyAutoencoder algorithm described in this paper. The results demonstrate a high degree of overlap between the original and restored spectra, indicating a high accuracy of spectral reconstruction, which essentially restores the original spectrum.
To conduct a more comprehensive evaluation of the restoration effects, four quantitative indices—Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), RMSE, and Spectral Angle Mapper (SAM)—were employed to analyze the reconstruction performance. The results are presented in Table 2. Among these, a downward arrow in brackets adjacent to the evaluation index signifies that a lower value is preferred, whereas an upward arrow denotes that a higher value is preferred. The numbers indicate the optimal results for each index.
During the validation of simulation data restoration, the average RMSE of 16 regions across simulated data 1 and simulated data 2 was 0.01, with a SAM value of 0.02. Both values are very close to 0, indicating an exceedingly high similarity to the original spectra. The average PSNR value was 41.80, reflecting a significant difference between the signal and noise in the results of this method, which indicates high signal quality. The average SSIM was 0.95, demonstrating a high degree of similarity to the original image, which indicates satisfactory restoration of mold stains.
However, in simulation region 2, although #4 and #6 were both simulated leaf mold regions, the vein texture in #4 was more complex, and the vein texture in #6 was simpler, which led to the difference in the accuracy of the restoration results. Region #4 with complex vein texture needs to handle more details when inpainting, which is prone to detail loss and reduced inpainting accuracy. In contrast, region #6 with a simpler vein texture achieves relatively higher inpainting accuracy.
In summary, the MoldSGR-AsyAutoencoder model demonstrates effective recovery of original spectra, with excellent RMSE, SAM, PSNR, and SSIM in most regions, highlighting its high-precision spectral reconstruction capability. However, in the complex vein sections of the background flower regions, the restoration effect slightly deteriorated, indicating the challenges the model still encounters in handling extremely intricate textures. Overall, the model demonstrates excellent image virtual restoration performance in most scenarios, providing strong evidence of its potential in hyperspectral image restoration.
Results of virtual restoration in real mold-affected regions
The result of hyperspectral restoration for mold stains is shown in Figs. 10 and 11. Six typical mold-affected regions in the painting, as examples, were shown in Regions 1–6. The virtual restoration results of the MoldSGR-AsyAutoencoder model are highly consistent with the color of the original hyperspectral image, and effectively retain the original texture and color details of the image. In contrast, the PCA inverse algorithm22 shows notable discrepancies in color from the original image in multiple regions, such as Region 1 and Region 6. The restored colors have undergone significant alterations, and the results in regions 2–5 are far from satisfactory. Similarly, the MNF inverse22 restoration results do not align well with the original images. For example, the clothing of the character in Region 1 changes from red to yellow, the clothing of the character in Region 2 changes from cyan to black, and the screen in Region 3 changes from cyan to brown. In Region 6, where the colors were simpler, the MNF inverse algorithm also achieved better results. However, the results of the PCA and MNF inverse algorithms were not ideal. The reason may be the complex color components of the images, which prevent the mold information from being concentrated on specific components, leading to distortions and suboptimal restoration results with the PCA and MNF inverse algorithms. In Region 1, both the 3D CNN and U-Net restoration algorithms exhibit a visible boundary artifact between the reconstructed and the surrounding regions. Furthermore, in Region 3, the 3D CNN restoration algorithm introduces chromatic distortion (excessive red saturation) in hand regions, significantly compromising restoration quality.
Consequently, from a visual perspective, the MoldSGR-AsyAutoencoder method achieves virtual restoration of mold spots, maintaining a high degree of color consistency with the original image, significantly outperforming PCA and MNF inverse restoration methods.
Compared to the PCA and MNF inverse algorithms, the MoldSGR-AsyAutoencoder algorithm effectively repairs mold marks. A detailed comparative analysis was conducted on four key regions: the brown background in Region2 (Fig. 12a), the forehead (Fig. 12b) and lips of the characters in Region4 (Fig. 12c), and the clothing of the characters in Region5 (Fig. 12d). As shown in Fig. 12, the PCA inverse algorithm's mold spot repair results were unsatisfactory, with little to no effective repair on the forehead, lips, and clothing. Although the MNF inverse algorithm showed some effectiveness in the mold spot repair task, it still left traces on the brown background, the character's forehead, lips, and clothing. In the results of the 3D CNN and U-Net algorithms, the brown background in Region 2 and the foreheads and lips of the characters in Region 4 exhibit distinct boundaries compared to their neighboring areas. In contrast, the MoldSGR-AsyAutoencoder algorithm achieved more thorough mold spot repair, leaving no traces after the repair process.
The algorithm demonstrates excellent detail retention compared to the PCA and MNF inverse algorithms. A comprehensive comparative analysis of three key areas was conducted: the fold lines in Region 1, the green and blue hues of the characters’ clothing (Fig. 13a), the green color of the characters’ clothing in Region 2 (Fig. 13b), and the fan on the forehead of the characters in Region 3 (Fig. 13c). As shown in Fig. 13a, the PCA and MNF inverse algorithms fall short in detail restoration, resulting in the erasure of pleated lines on the dress and alterations to its green and blue hues. In Fig. 13b, the PCA inversion algorithm turns the green color of the character’s clothing to cyan, whereas the MNF inversion algorithm changes it to black, transforming the purplish brown to azure, and blurring the texture. Conversely, the algorithm preserves the original texture while maintaining color accuracy. Referring to Fig. 13c, the MNF inverse algorithm result shows significant changes to the fan’s cyan texture, resulting in a blurry swastika pattern. Among the 3D CNN and U-Net algorithm results, the results in Region 2 (Fig. 13b) have obvious contours within the neighborhood. In contrast, the MoldSGR-AsyAutoencoder algorithm of mold virtual restoration well preserved the original texture details of the cultural relics.
Discussion
Due to the inability to obtain the true values of pixels affected by mold in the visible spectrum, the means of the surrounding pixels are used to approximately replace the true values, enabling a quantitative evaluation of the virtual restoration results. Theoretically, the same object affected by mold and the adjacent non-affected areas should have similar gray values, so the better the repair effect, the smaller the difference between the two. To accurately evaluate the inpainting quality, RMSE, Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE) are used to quantitatively analyze and discuss the virtual restoration results of mold-affected regions in hyperspectral images.
As shown in Fig. 14, the quantitative evaluation results of visible virtual restoration in the mold-affected regions of Region 1 were presented, where Fig. 14a–c show the quantitative results for the white pigment regions on the person's hand, Fig. 14d–f for the brown background regions, Fig. 14g–i for dark-colored garments, and Fig. 14j–l show the quantitative results for the light red clothes region. The proposed MoldSGR-AsyAutoencoder algorithm maintains a low level of these index values across almost all bands. The comparative analysis in Table 3 shows that the proposed algorithm achieved optimal performance (mean values across all spectral bands) relative to the Inverse PCA, MNF transformation, 3D CNN, and U-Net approaches for all indices measured in Region 1. Within these regions, the RMSE values range from 0 to 0.12, the MAPE values range from 0 to 24%, and the MAE values range from 0 to 0.12, further validating the effectiveness of the proposed method.
As shown in Fig. 15, the quantitative evaluation results of visible virtual restoration in the mold-affected regions of Region 2 and Region 3 were presented, where figures Fig. 15a-c show the brown background of region 2, Fig. 15d-f show the white pigment on the hands of people in region 3, and Fig. 15g-i represent the white pigment on the faces of people in region 3. The mean RMSE values in each region range from 0 to 0.08, MAPE values range from 4 to 18%, and MAE values range from 0 to 0.07. Furthermore, the proposed algorithm demonstrates low index values across almost all bands. Quantitative analysis shows the 3D CNN algorithm attained peak performance in all evaluation metrics for this region (Table 3). Although the MNF inverse transformation and our proposed method exhibited comparable metric curves (secondary in mean values), visual assessment from Fig. 9 indicates: MNF introduced chromatic aberrations in brown backgrounds and green clothing, and 3D CNN suffered from excessive detail smoothing. These observations confirm the comprehensive advantages of our proposed approach.
For Region 3, the range of RMSE values was 0–0.09, with MAPE values ranging from 4 to 15% and MAE values from 0.02 to 0.08. In the white pigment regions of the human hand, the inverse MNF closely matches our proposed algorithm in each evaluation index curve. However, in the white paint regions of the human face, the evaluation index curve of the proposed algorithm was in the optimal state. Considering the mean values of various indicators (as shown in Table 4), the performance of the proposed algorithm is optimal.
The evaluation demonstrated significant inter-algorithm performance differences between Regions 4-6, indicating area-dependent variations in reconstruction quality. Figure 16 illustrates the quantitative evaluation results of visible virtual restoration in the mold-affected regions of Region 4. Among them, Fig. 16a–c present the evaluation results for the person’s forehead in Region 4, while Fig. 16d–f show the evaluation results for the white area on the person’s face in Region 4. From the quantitative results, in Region 4, the RMSE values range from 0 to 0.12, the MAPE values range from 2 to 24%, and the MAE values range from 0 to 0.1. In the forehead region, the 3D CNN algorithm achieved optimal quantitative metrics (RMSE = 0.056, MAPE = 6.502%, MAE = 0.031) as shown in Table 4. Our proposed algorithm ranked second with RMSE = 0.059, MAPE = 7.555%, and MAE = 0.034, showing no statistically significant difference from the top-performing method. For the facial core region, the U-Net algorithm demonstrated superior performance (RMSE = 0.047, MAPE = 7.049%, MAE = 0.036), followed by 3D CNN (RMSE = 0.046, MAPE = 7.571%, MAE = 0.037), while our method ranked third (RMSE = 0.053, MAPE = 8.033%, MAE = 0.041). Notably, although traditional algorithms showed marginal quantitative advantages, they produced images with significant over-smoothing effects (Fig. 10), manifesting as lost skin texture details and reduced edge sharpness, resulting in unnatural visual quality. In contrast, our algorithm generated more natural-looking results, demonstrating superior performance in both color restoration and detail preservation.
Similarly, Fig. 17 demonstrates the quantitative evaluation outcomes of visible virtual restoration for mold-affected regions in Region 5. Specifically, Fig. 17a–c show the quantitative evaluation results of the white pigment on the person’s clothing in Region 5, and Fig. 17d–f show the quantitative evaluation results of the brown pigment on the tree trunk in Region 5. Quantitative analysis indicates that the RMSE values for Region 5 vary between 0 and 0.16, with MAPE values ranging from 2 to 35%, and MAE values spanning from 0 to 0.16. Similarly, in the quantitative evaluation of garment regions (Region 5), our method achieved superior performance (RMSE = 0.093, MAPE = 14.068%, MAE = 0.065), outperforming both U-Net (RMSE = 0.098, MAPE = 15.464%, MAE = 0.07) and other comparative algorithms, ranking first. For the brown material in tree regions, although traditional CNN algorithms achieved optimal metrics owing to their strong local feature extraction capabilities, our proposed method ranked a close second (ΔRMSE ≤ 0.002). Notably, in the visual naturalness assessment, our algorithm exhibited the highest fidelity (as shown in Fig. 10), thereby providing additional evidence of its efficacy in complex color and texture restoration tasks.
Finally, in Region 6, the reliability of the proposed algorithm was further confirmed. As shown in Fig. 18, the quantitative evaluation results of visible virtual restoration in the mold-affected regions of Region 6 were presented. Specifically, Fig. 16a–c depict the quantitative results for the light red regions of people’s clothing in Region 6, while Fig. 16d–f show the quantitative results for the darker red regions. The RMSE values range from 0 to 0.14, the MAPE values range from 5 to 25%, and the MAE values range from 0 to 0.12. However, in the light red regions, before 600 nm, the performance of our algorithm is not as superior as PCA and MNF inverse. After 600 nm, the proposed algorithm achieved improved performance, while the U-Net algorithm achieved optimal performance, and our proposed method ranked third in the mean evaluation (Table 4), with only a little difference observed. However, in the dark red region, the performance of the proposed algorithm was optimal between the inverse PCA, MNF, 3D CNN, and U-Net methods.
In summary, the proposed algorithm exhibits excellent performance in mold-affected repair tasks. This indicates that the proposed algorithm can effectively restore the original color and texture details in mold-affected regions, providing reliable technical support for the virtual restoration of calligraphy and painting artifacts.
We conducted comparative experiments on the MoldSGR-AsyAutoencoder model on the Portrait of Pañcika Arhat, a hyperspectral dataset, to evaluate its performance. The painting, created in 1756 by Ding Guanpeng—a famous court painter under Emperor Qianlong’s reign—served as the test subject.
Figure 19 visually demonstrates the results, effectively illustrating how our method generates more esthetically pleasing outcomes compared to PCA, MNF inversion, 3D CNN, and U-Net algorithms. The results of the 3D CNN and U-Net methods indicate that the restoration regions exhibit noticeable boundaries with neighboring areas, and image details appear overly smoothed, leading to a lack of authenticity in the results. In contrast, our approach delivers more natural processing with accurately recovered texture details, as shown in Fig. 19. Overall, our method proves highly effective for restoring traditional Chinese paintings.
Since the MoldSGR-AsyAutoencoder model architecture is fundamentally grounded in the spectral characteristics of mold regions, we perform an ablation study to systematically evaluate the contributions of spectral similarity, spatial similarity, and asymmetric encoder design to the restoration performance.
This study conducted systematic comparative analyses through spectral similarity ablation experiments, evaluating the MoldSGR-AsyAutoencoder model with successive removal of the SAM, SID, SID_SAM, and Variance modules. The experimental results are detailed in Table 5. Quantitative evaluation data demonstrate that the MoldSGR-AsyAutoencoder model, incorporating spectral similarity constraints, achieved significant performance improvements, with a 10.09% increase in MPSNR and a 7.95% enhancement in MSSIM. These quantitative results effectively validate the technical superiority of our model in the task of virtual restoration of paintings and calligraphic works.
We performed an ablation study comparing two variants of the MoldSGR-AsyAutoencoder: one incorporating spatial similarity constraints and another without this mechanism. As demonstrated in Table 5, quantitative evaluation reveals that the spatial similarity-enhanced model achieves a significant improvement of 8.19% in MPSNR compared to its counterpart. These results conclusively validate the critical role of spatial similarity in enhancing virtual restoration performance, establishing our approach as superior for this task.
This study conducts systematic ablation research on the MoldSGR-AsyAutoencoder model, focusing on loss function configuration, network depth parameters, and encoder-decoder symmetry, with detailed experimental results presented in Table 6.
In the loss function comparative experiments, quantitative analysis was performed by constructing three baseline models using MAE, Huber (Smooth L1 Loss), and RMSE. The results demonstrate significant performance breakthroughs, showing an 18.23% improvement in MPSNR and a 17.28% enhancement in MSSIM compared to baseline models.
For the network depth ablation study, three control experiments were conducted by sequentially removing: (1) the final encoder layer, (2) the final decoder layer, and (3) both structures. Quantitative analysis revealed 13.26% MPSNR gain and 11.76% MSSIM improvement in our architecture.
The encoder-decoder symmetry verification involved comparative studies between: Standard symmetric architecture: Encoder: Input→256 → 128 → 64, Decoder: 64 → 128 → 256→Output, and Deep mirror architecture: Encoder: Input→512 → 256 → 128 → 64, Decoder: 64 → 128 → 256 → 512→Output. Experimental data confirmed our model’s superiority with an 8.10% MPSNR increase and 5.56% MSSIM improvement. These multidimensional ablation results comprehensively validate the technical advantages of our model in virtual restoration of paintings and calligraphic works.
In this study, the MoldSGR-AsyAutoencoder model was proposed for the hyperspectral virtual restoration of mold-affected silk paintings. Through detailed hyperspectral analysis of the mold spectral response, it was observed that mold-affected regions on silk paintings exhibit spectral invariance in the near-infrared range. Considering this feature, a similarity discrimination strategy was developed to effectively search for highly similar samples by integrating spectral similarity and spatial information. Based on this, an asymmetric autoencoder model was designed to achieve precise spectral virtual restoration of mold-damaged areas in silk paintings. Experimental results demonstrate that the proposed method achieved impressive performance metrics in two simulated regions, with mean values of RMSE, RSAM, MPSNR, and MSSIM at 0.01, 0.02, 41.78, and 0.94, respectively. Moreover, the algorithm demonstrated superior restoration capabilities in six real-world mold-affected regions. Additionally, we evaluated the model’s applicability on another hyperspectral image, further validating its robustness and versatility.
The proposed algorithm operates by analyzing mold’s spectral characteristics to virtually reconstruct the spectrum of mold-damaged center pixels. This reconstruction is achieved through optimal utilization of spectral information from surrounding spatial neighborhoods. For example, in the restoration of mold-affected regions, even when the central pixel suffers severe spectral distortion, its surrounding undamaged pixels can provide reliable spectral references. This neighborhood-based restoration framework capitalizes on spatial-spectral correlations to accurately reconstruct original spectral signatures in contaminated regions. Furthermore, the algorithm incorporates near-infrared spectral similarity assessment within the neighborhood analysis. This process enables the systematic identification of non-contaminated pixels exhibiting spectral signatures analogous to mold-affected regions. The resulting high-quality sample selection significantly improves the quality of the training dataset for spectral reconstruction. This comprehensive integration of spatial-spectral information significantly improves both the precision of virtual reconstruction and the algorithm’s robustness, enabling consistent performance even in complex scenarios. Notably, in smooth regions of natural scenes, where neighboring pixels exhibit strong spectral homogeneity, the algorithm reliably captures the central pixel’s spectral profile, further enhancing its practical applicability and reconstruction fidelity.
While the proposed reconstruction algorithm demonstrates significant advantages in hyperspectral image processing, several limitations warrant discussion. The current methodology’s reliance on near-infrared (NIR) spectral similarity for local neighborhood sample selection presents a notable constraint. However, this approach may lead to introduce errors when different materials within adjacent regions exhibit comparable NIR signatures but divergent visible spectral characteristics—a common occurrence in complex scenes involving heterogeneous targets. This spectral ambiguity represents a critical challenge for future algorithmic enhancements. In future research, we will prioritize the integration of multi-scale fusion techniques to advance hyperspectral virtual reconstruction34. By synergistically combining global and local spectral information, this approach can hierarchically optimize reconstruction performance, preserve large-scale structural integrity while enhance fine-grained details in intricate scenes. The development of this multi-scale optimization framework will be a central focus of our future work.
Data availability
The datasets generated or used during the study are available from the corresponding author if they are required for scientific research.
Code availability
Some or all code generated or used during the study are available from the corresponding author if they are required for scientific research.
References
Maali Amiri, M. & Messinger, D. W. Virtual cleaning of works of art using deep convolutional neural networks. Herit. Sci. 9, 94 (2021).
Kumar, P. & Gupta, V. Restoration of damaged artworks based on a generative adversarial network. Multimed. Tools Appl. 82, 40967–40985 (2023).
Ge, H., Yu, Y. & Zhang, L. A virtual restoration network of ancient murals via global–local feature extraction and structural information guidance. Herit. Sci. 11, 264 (2023).
Wu, M., Chang, X. & Wang, J. Fragments inpainting for tomb murals using a dual-attention mechanism GAN with improved generators. Appl. Sci. 13, 3972 (2023).
Kumar, P. et al. Artwork restoration using paired image translation-based generative adversarial networks. ITM Web Conf. 54, 01013 (2023).
Zeng, Z. et al. Virtual restoration of ancient tomb murals based on hyperspectral imaging. Herit. Sci. 12, 410 (2024).
Sun, X. et al. Structure-guided virtual restoration for defective silk cultural relics. J. Cult. Herit. 62, 78–89 (2023).
Mol, V. R. & Maheswari, P. U. The digital reconstruction of degraded ancient temple murals using dynamic mask generation and an extended exemplar-based region-filling algorithm. Herit. Sci. 9, 137 (2021).
Hu, Q. et al. ConvSRGAN: super-resolution inpainting of traditional Chinese paintings. Herit. Sci. 12, 176 (2024).
Liu, S., Liu, Z., Liu, L., Fan, S. & Zhao, A. Color characteristics analysis and restoration for chinese painting and calligraphy. J. Image Signal Process. 12, 9–20 (2023).
Zhang, L. et al. Progress of hyperspectral remote sensing applications on cultural relics protection. Acta Geod. Cartogr. Sin. 52, 1126–1138 (2023).
Gong, M. & Feng, P. Preliminary study on the application of hyperspectral imaging in the classification of and identification Chinese traditional pigments classification—a case study of spectral angle mapper. Sci. Conserv. Archaeol. 26, 76–83 (2014).
Daniel, F. et al. Hyperspectral imaging applied to the analysis of Goya paintings in the Museum of Zaragoza (Spain). Microchem. J. 126, 113–120 (2016).
Wen, R. & Fan, F. Quantifying pigment features of Thangka Five Buddhas using hyperspectral imaging. J. Cult. Herit. 70, 120–133 (2024).
Li, G. H. et al. An automatic hyperspectral scanning system for the technical investigations of Chinese scroll paintings. Microchem. J. 155, 104699 (2020).
Grabowski, B., Masarczyk, W., Głomb, P. & Mendys, A. Automatic pigment identification from hyperspectral data. J. Cult. Herit. 31, 1–12 (2018).
Pan, N. et al. Extracting faded mural patterns based on the combination of spatial-spectral feature of hyperspectral image. J. Cult. Herit. 27, 80–87 (2017).
Zhou, P. et al. Virtual restoration of ancient painting stains based on classified linear regression of hyper-spectral image. Geomat. World 24, 113–118 (2017).
Hou et al. Virtual restoration of mildew stains on calligraphy and paintings based on abundance inversion and spectral transformation. Sci. Conserv. Archaeol. 35, 8–18 (2023).
Xu, W. et al. Research on mural painting appreciatione based on spectral imaging and spectral analysis. Spectrosc. Spectr. Anal. 37, 3235–3241 (2017).
Wu, T., Cheng, Q., Wang, J., Cui, S. & Wang, S. The discovery and extraction of Chinese ink characters from the wood surfaces of the Huangchangticou tomb of Western Han Dynasty. Archaeol. Anthropol. Sci. 11, 4147–4155 (2019).
Hou, M. et al. Virtual restoration of stains on ancient paintings with maximum noise fraction transformation based on the hyperspectral imaging. J. Cult. Herit. 34, 136–144 (2018).
Qiao, K., Hou, M., Lyu, S. & Li, L. Extraction and restoration of scratched murals based on hyperspectral imaging—a case study of murals in the East Wall of the sixth grotto of Yungang Grottoes, Datong, China. Herit. Sci. 12, 123 (2024).
Sun, P. et al. Enhancement and restoration of scratched murals based on hyperspectral imaging—a case study of murals in the Baoguang Hall of Qutan Temple, Qinghai, China. Sensors 22, 9780 (2022).
Wang, S. et al. Virtual restoration of ancient mold-damaged painting based on 3D convolutional neural network for hyperspectral image. Remote Sens. 16, 2882 (2024).
Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y. & Manzagol, P.-A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11, 3371–3408 (2010).
Wang, S. et al. A mildew spectral index of ancient Chinese silk paintings based on hyperspectral imaging data. J. Cult. Herit. 75, 50–63 (2025).
Li, G. et al. Study on the application of an automatic hyperspectral scanning system to investigate Chinese paintings. In Proc. Transcending Boundaries: Integrated Approaches to Conservation. ICOM-CC 19th Triennial Conference Preprints (ed. Bridgland, J.) International Council of Museums. https://www.icom-cc-publications-online.org/4485/Study-on-the-application-of-an-automatic-hyperspectral-scanning-system-to-investigate-Chinese-paintings (2021).
Qu, L.-L. et al. Thin layer chromatography combined with surface-enhanced Raman spectroscopy for rapid sensing aflatoxins. J. Chromatogr. A. 1579, 115–120 (2018).
Yu, Q. Research on optical fiber sensor for online detection of mold and disease process parameters on paper cultural relics. Chongqing University of Technology, in China, Chongqing, 33–35 https://doi.org/10.27753/d.cnki.gcqgx.2023.000269 (2023).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Kruse, F. A. et al. The spectral image processing system (SIPS)—interactive visualization and analysis of imaging spectrometer data. Remote Sens. Environ. 44, 145–163 (1993).
Chang, C.-I. Spectral information divergence for hyperspectral image analysis. In Proc. IEEE 1999 International Geoscience and Remote Sensing Symposium, IGARSS'99 (Cat. No.99CH36293), Hamburg, Germany, 28 June 1999–02 July 1999, 509–511. https://doi.org/10.1109/IGARSS.1999.773549 (1999).
Fu, J. et al. Dual attention network for scene segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15-20 June 2019, 3141–3149. https://doi.org/10.1109/cvpr.2019.00326 (2019).
Acknowledgements
This work was supported by the National Key R&D Program of China (2023YFF0906701;2022YFF0904400), the Peach and Plum Program of the Palace Museum, and Vanke Foundation.
Author information
Authors and Affiliations
Contributions
S.W.: Conceptualization, Methodology, Data curation, Formal analysis, Investigation, Figures, Writing—original draft preparation, Writing—review and editing. Yi C.: Conceptualization, Methodology, Formal analysis, Supervision, Resources, Project administration and Funding acquisition. L.Q.: Conceptualization, Supervision, Resources, Project administration and Funding acquisition. Y.D.: Data curation, Formal analysis, Validation, Investigation and Writing—original draft preparation. G.L.: Writing—original draft preparation and Writing—review and editing. Yao C.: Writing—original draft preparation and Writing—review and editing. All authors reviewed the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, S., Cen, Y., Qu, L. et al. Virtual restoration of ancient mold-damaged paintings based on spectral-guided asymmetric autoencoder for hyperspectral images. npj Herit. Sci. 13, 533 (2025). https://doi.org/10.1038/s40494-025-02103-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s40494-025-02103-0


















