Introduction

Historical and cultural heritage sites are irreplaceable repositories of human civilisation, whose authenticity and uniqueness largely determine the quality of visitors’ encounters1. The tourist experience at such sites is inherently multisensory; visual appreciation of monuments intertwines with tactile, olfactory and, crucially, acoustic cues to form an integrated human–environment interaction system.

The International Organization for Standardization (ISO) 12913-1 defines a soundscape as ‘an acoustic environment as perceived, experienced and understood by individuals or groups in context’2, highlighting the primacy of subjective appraisal and the reciprocal ties among sound, place and culture. In heritage contexts, soundscapes possess dual value. Soundscapes with social value are regarded as having cultural heritage significance, reflecting the intrinsic properties of the heritage itself3. Sounds enriched with historical depth and regional cultural meanings constitute the core of heritage soundscape values while simultaneously functioning as experiential carriers that convey cultural symbols and social memory beyond their physical stimuli. For example, the acoustic environment of Portuguese churches has been recognised as cultural heritage because it preserves memories that form part of the local cultural identity4. Similarly, the sounds of theatrical performances, temple bells and traditional musical expressions are recognised as forms of intangible cultural heritage. They function as essential components of place-making. These sounds also act as acoustic markers of identity.

This study examines how the overall acoustic environment shapes visitors’ experiences. Because heritage soundscapes represent symbols of place, store historical memory and act as media of emotional connection, the acoustic environments of cultural heritage sites – each containing diverse sound sources—tend to be more complex. Visitor activities, commercial sounds and surrounding traffic noise can exert strong influences on how people perceive the site5. Empirical studies have shown that pleasant natural sounds, such as flowing water or birdsong, alleviate stress and heighten enjoyment, whereas discordant traffic or construction noise degrades perceived quality6,7,8,9. These findings reflect how sound, as an environmental stimulus, is perceived and interpreted from an environmental psychology perspective. Beyond simple loudness, the clarity and layering of auditory elements can deepen spatial legibility in heritage settings10. Given that functional zones at heritage sites typically unfold in sequence, the accompanying acoustic environment forms a temporal narrative—often moving from high-noise gateways through calm core areas to revitalised rest spaces—that moulds expectations and memory11. Core soundscapes with cultural heritage values also play a vital role in shaping place attachment and heritage identity and, therefore, require careful preservation and enhancement12.

Tourism experience is commonly defined as the connections, interactions and subjective meanings that visitors construct through activities related to a destination. Research in marketing, psychology, geography and heritage studies has examined this concept from multiple disciplinary perspectives13,14,15,16. Earlier studies linked tourism experience to pre-trip, on-site and post-trip satisfaction, as well as destination evaluation and loyalty17. Framework-oriented research ranges from customer-experience models18 and cultural-consumption theory19 to analyses of the synergy between experience quality and commercialisation in heritage settings20. Purpose-driven studies have explored, for example, how food experiences enhance subjective wellbeing21 or how urban entertainment influences visitor satisfaction22, while authenticity remains a recurrent theme in cultural tourism research23. From a perceptual perspective, experience has been conceptualised as both a psychological state grounded in emotional interactions with place23,24 and a cognitive process of cross-cultural communication25. Overall, existing studies have highlighted the need to incorporate acoustic factors into holistic models of the heritage site experience. The focus should extend beyond idealised or ‘pure’ heritage sounds to the entire acoustic environment, including the harmony or conflict among different sound sources and how these conditions shape visitor experience, satisfaction and attachment. A deeper examination of heritage soundscape values is therefore required. This involves understanding how soundscapes influence emotional connection, cognitive appraisal, heritage identity and behavioural intentions and how these processes contribute to overall experience quality and meaning-making. The integrative framework developed in this study aims to address this theoretical gap.

Tourism’s rapid expansion has made soundscape stewardship at heritage sites both urgent and complex. Although a growing body of work recognises that visitor influx and commercial activity reshape acoustic environments, three persistent shortcomings can be identified. First, soundscape evaluations still privilege physical-acoustic indicators (e.g., loudness and signal-to-noise ratio) over the cultural–semantic meanings embodied in sound. Second, most studies isolate individual sound sources or single moments, overlooking the dynamic spatiotemporal sequences through which visitors actually experience heritage spaces. Third, experience-oriented research continues to concentrate on visual aesthetics while treating sound, alongside its interplay with lighting and other sensory inputs, as peripheral. Collectively, these gaps obscure how background noise, activity sounds and culturally symbolic cues merge to influence atmosphere and visitor cognition.

To address these deficiencies, the present study adopts an integrated three-dimensional framework that combines (i) physical sound-field measurement, (ii) perceptual appraisal and (iii) cultural decoding. A mixed-methods design triangulates quantitative and qualitative evidence: principal component analysis (PCA) clarifies latent acoustic attributes, structural equation modelling (SEM) traces causal pathways from soundscape perception to visitor outcomes and grounded-theory coding uncovers the semantic layers through which sounds acquire meaning. By modelling the full chain—‘acoustic stimulus → perceptual evaluation → emotional and behavioural response’ – the research illuminates how soundscape optimisation can reinforce living heritage values. The findings (i) embed the acoustic dimension within a holistic experience model, (ii) provide actionable guidance for spatiotemporal soundscape configuration in conservation practice and (iii) extend sustainable tourism discourse by demonstrating that enhanced cultural immersion can emerge from purposeful acoustic design.

Methods

Experimental design

This study focuses on two architecturally and spiritually emblematic heritage sites in China’s Shanxi Province: Datong’s Yungang Grottoes and Yuncheng’s Yongle Palace (Fig. 1). In recent years, as a United Nations Educational, Scientific and Cultural Organization (UNESCO) World Cultural Heritage site, the cultural influence and reach of the Yungang Grottoes have continued to grow, with 4.42 million visitors received in 2024, while Yongle Palace also attracted approximately 400,000 to 500,000 visitors annually.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

The geographical locations of the study’s two heritage sites.

The UNESCO World Heritage Site of the Yungang Grottoes consists of 252 caves stretching over one kilometre, 114 of which preserve masterworks of Northern Wei Buddhist sculpture. The integration of realistic and symbolic carvings in these caves reflects the cultural exchanges and the spread of Buddhism between the 5th and 6th centuries. Yongle Palace, currently included in China’s Tentative List for World Heritage, is the central complex of Yuan-dynasty Daoist architecture. Its murals, covering more than 1000 square metres, form the largest surviving mural group from the Yuan period. The palace features the most intact central axis layout among Daoist architectural complexes and is renowned for its glazed-tile decorations, offering an exceptional site for studying the development of Daoist art. Despite their clear differences in form and surrounding environments, both sites contain multilayered soundscape components and complete spatial sequences for visitor movement. They also share essential features, such as religious and symbolic sounds and stage-based changes in visitor flow. These similarities make them an ideal pair for testing whether soundscape mechanisms remain robust across different spatial types and religious–cultural contexts. By examining these two representative heritage forms in parallel, this study explores the general mechanisms through which soundscapes affect visitor experiences rather than limiting the analysis to a single site type. Figure 2 displays the on-site tourist scenes of the two heritage sites.

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Onsite tourist scenes of the two heritage sites.

Sound environment surveys at the Yungang Grottoes and Yongle Palace began with on-site measurements and acoustic sequence mapping. Sound pressure level heatmaps and spatial sequence diagrams were then generated to visualise the acoustic characteristics of each zone and to quantify the intensity gradients of the dominant soundscape components at both heritage sites. Based on this quantitative foundation, structured questionnaires were used to collect visitors’ perceptual feedback. Data reliability was ensured through a multistage quality control process. During the acoustic measurement stage, data completeness was first verified, followed by outlier screening of the triplicate measurements. After removing three abnormal data points caused by sudden and irregular sound events, the averaged equivalent continuous A-weighted sound pressure levels (LAeq) from valid measurements were used to generate sound pressure–level heatmaps. The same procedure was applied to the L10 and L90 indicators to ensure the accuracy of the acoustic analysis. The questionnaire items were further refined through pilot testing to reduce completion time while preserving content validity. In the subsequent analysis, outlier detection and bias correction procedures were applied, followed by a double-blind review as the final screening step. The operational definitions and diagnostic criteria used for data validation and cleaning of both the instrumental and survey datasets are summarised in Table 1.

Table 1 Data screening and exclusion criteria

Respondents

A total of 449 on-site questionnaires were collected at the Yungang Grottoes and Yongle Palace using random intercept sampling. After the quality-control procedures described above, 417 valid questionnaires were retained. This sample size meets conventional standards for multivariate survey research and provides sufficient statistical power for subsequent analyses26.

Survey instrument

The questionnaire (see Supplementary Information for the full form) consisted of three parts:

(1) Background information – Age, gender, visitor type, length of stay, education and occupation.

(2) Soundscape perception – Following sound-sequence theory, the site was divided into four functional soundscape zones: (i) entrance/exit circulation, (ii) core exhibition, (iii) performance space and (iv) experience and rest areas. The soundscape perception scale was developed using the ISO soundscape framework and semantic anchors. The respondents rated each zone on sound comfort, cultural–traditional attributes and overall atmosphere using a five-point Likert scale.

(3) Tourist experience – Thirteen indicators covering emotional, cognitive, behavioural and sensory dimensions were adapted from established experience scales27,28 and revised to heritage soundscapes. All items were evaluated using a five-point Likert scale.

Research design

Recognising the multidimensional complexity of heritage soundscapes, this study adopted a mixed-methods framework that triangulates quantitative and qualitative evidence29. PCA extracted three latent dimensions of soundscape perception and tourist experience: folk–cultural resonance, sensory–emotional response and behavioural appraisal. SEM was then used to trace the multivariable pathways linking these dimensions. To complement these analyses, a grounded theory approach was applied within the triangulation framework. The semi-structured interviews data were coded using grounded theory methods to systematically reveal the psychological mechanisms and narrative logic underlying visitors’ acoustic judgements30. SEM and the Sc–Sq dual-axis framework constituted the core analytical approach, supported by acoustic measurements and sound-sequence analyses as empirical evidence, with grounded theory coding used to complement and interpret experiential mechanisms beyond the SEM.

Field procedures

Sound field data were collected on clear nonworking days in September 2024 by a trained crew of 20 undergraduate and graduate assistants. A calibrated HS5671B sound-level metre was deployed at fixed stations covering all four functional soundscape zones. Microphones were mounted 1.20 m above ground and placed at least 1 m from reflective surfaces or discrete sources to minimise boundary effects. Following ISO 12913 guidelines, each station was sampled three times for 60 s between 10:00 and 17:00, and the A-weighted equivalent continuous sound levels (LAeq) were logged for subsequent analysis. The equivalent continuous A-weighted sound level (LAeq) was selected as the core physical acoustic indicator due to its high relevance to the research objectives, ease of measurement and broad comparability within the field. To reveal the spatial distribution patterns of sound pressure levels, LAeq serves as an appropriate comprehensive energy-averaged metric.

Acoustic measurements and visitor perceptions were recorded in parallel with acoustic measurements through onsite intercept sampling conducted from September to November 2024 under comparable weather conditions. To verify the representativeness of the September acoustic data for the entire survey period, supplementary measurements from different months were examined using independent-samples t-tests, which revealed no significant differences in sound pressure levels. These results indicate that the acoustic environment of the heritage sites remained stable during the survey period, effectively controlling for cross-month variation. Although intercept sampling may introduce selection bias due to variations in visitor dwell time and could overrepresent deeply engaged visitors, its high ecological validity allows for the immediate capture of soundscape perception data on site, fulfilling the study’s need to explore causal mechanisms between soundscape and experience. Passers-by with normal hearing were briefed on the study and completed a structured questionnaire outdoors in approximately 10 min. Of the 449 questionnaires distributed, 417 were returned, yielding a response rate of 92.9%.The final sample was gender-balanced (53.0% female, 47.0% male) and met conventional guidelines for multivariate survey research. The full demographic characteristics are presented in Table 2.

Table 2 Demographic characteristics of participants

In addition, Thirty-two visitors (17 males and 15 females, aged 19–60 years; M = 30.3) from the two heritage sites also participated in 30–45 min outdoor semi-structured interviews designed around the ISO 12913-2:2018 soundscape framework to explore experience facets not covered by the questionnaire. Interviewees included multidisciplinary scholars and highly articulate tourists, purposefully sampled to maximise heterogeneity in age and education while prioritising individuals with strong sensory acuity. Participants with limited comprehension or expressive ability were excluded to ensure data richness. The interviews were designed based on the ISO 12913-2:2018 soundscape framework to explore experiential dimensions not captured by the questionnaire. All sessions were audio-recorded with consent and transcribed verbatim, producing approximately ≈86,000 Chinese characters.

Two researchers specialising in architecture and soundscape conducted open, axial and selective coding following the Strauss-and-Corbin procedure. An expert panel subsequently cross-checked the coding for reliability (Fig. 3; Supplementary Information, Table S5)31. Theoretical saturation was achieved through a systematic iterative process. First, the initial set of transcripts (8 interviews) was coded at three levels to form a preliminary theoretical framework. Subsequent interviews were conducted using purposive sampling, with continuous comparison of new data against the existing framework to identify emerging categories or relationships. Data collection ceased once new interviews no longer generated important new categories or theoretical insights, indicating that theoretical saturation had been reached32. All transcripts were anonymised prior to semantic coding.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Grounded theory experimental process.

Data analysis

By combining close-ended questionnaires with semi-structured interviews, this study employed three complementary analytical lenses:

(i) Descriptive and correlation statistics were calculated using SPSS 26 to assess intra- and inter-group variation in soundscape perception, appraisal and tourist experience.

(ii) PCA and SEM were applied to visualise zone-specific perception patterns and model the multifactor relationships among perception, evaluation and experience. Eight standardised sound perception attributes were subjected to PCA. Based on the scree plot and eigenvalue > 1 criterion, two orthogonal components were retained, accounting for the majority of cross-site variance. Each case was projected onto a two-dimensional component space as follows:

$$P{C}_{1}=\mathop{\sum }\limits_{i=1}^{n}{\omega }_{1i}\cdot {x}_{i},P{C}_{2}=\mathop{\sum }\limits_{i=1}^{n}{\omega }_{2i}\cdot {x}_{i},$$

where \({x}_{i}\) denotes the \(i\)-th standardised attribute and \({\omega }_{1i}\), \({\omega }_{2i}\) are the loadings (elements of the first two eigenvectors of the covariance matrix).

To visualise regional distribution patterns, the resulting scores were plotted in Python (Seaborn, Matplotlib). Colour and marker coding distinguished the two heritage sites, while overlaid kernel-density contours quantified distributional overlap and highlighted spatial clustering. These component scores were subsequently used as latent inputs to a structural equation model (AMOS 24), which simultaneously estimated direct pathways from soundscape attributes → subjective appraisal → tourist experience.

(iii) Grounded theory coding was used to trace the qualitative pathways through which acoustic cues shape visitor responses. Verbatim transcripts of the 32 semi-structured interviews were imported into NVivo 11 and analysed. Open coding identified salient phrases, axial coding linked categories and selective coding integrated themes into a soundscape-perception → tourist-experience impact model.

Results

Spatial distribution of the physical sound environment

Figure 4 presents LAeq surfaces interpolated using the inverse distance weighting method. At the Yungang Grottoes, the highest sound levels occurred at high-traffic nodes: Cave 20 reached 69.0 dB(A), Yungang Gate reached 64.3 dB(A) and Tanyao Squarereached 65.9 dB(A). In contrast, vegetated rest areas and performance zones consistently remained below 55 dB(A), providing a low-noise refuge, even during peak visitor periods.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Heat map of sound pressure level at Yongle Palace.

At Yongle Palace, the overall acoustic environment was quieter, with LAeq values ranging from 47.4 to 67.87 dB(A). Peaks near the Longhu Hall courtyard and at the entrances of Sanqing and Chunyang Halls reached 68 dB(A), primarily due to bottlenecked queues and amplified tour commentary. Peripheral green belts, the mural-copying room and ecological buffer zones remained below 50 dB(A), while the most secluded natural areas dropped to 35 dB(A). This ‘loud front, quiet rear’ gradient reflects the ceremonial layout of the palace and highlights the acoustic attenuation provided by surrounding pine and cypress vegetation.

Sound-sequence statistics for LAeq, L90 and L10, revealed a distinct ‘high–low’ rhythm along visitor paths at both heritage sites (Figs. 46). These routes pass through the four functional contexts (Fig. 5).

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Various soundscape contexts at the heritage site.

At the Yungang Grottoes (Fig. 6a), the path followed the sequence entrance/exit → main tour → performance → rest. LAeq peaked at 71 dB(A) in the most crowded caves, dropped to ≈50 dB(A) in the performance square, stabilised near 60 dB(A) in the vegetated rest zone and rose again towards the exit. The wide L90–Leq gap in performance and rest areas indicates a quiet background punctuated by discrete acoustic events, such as shows or clustered visitors.

Fig. 6: Spatial Sound Level Sequences of the Yungang Grottoes and Yongle Palace.
Fig. 6: Spatial Sound Level Sequences of the Yungang Grottoes and Yongle Palace.The alternative text for this image may have been generated using AI.
Full size image

Spatial sound-sequence profile (LAeq, L10, L90) along the main visitor route of the Yungang Grottoes (a) and Yongle Palace (b). Blue line indicates LAeq, red line indicates L90, and grey line indicates L10.

At Yongle Palace (Fig. 6b), the looped route followed entrance/exit → rest → main tour → rest → performance. Initial levels were moderate (46–48 dB(A)), surged to 70.1 dB(A) along the central axis (Longhu–Sanqing–Chunyang Halls), dropped below 55 dB(A) in the secondary garden rest area and ended in the performance court. The L10–Leq gaps were narrower than those at Yungang, indicating a more stable acoustic environment with fewer abrupt noise spikes. Overall, the recurring “loud front, quiet rear” alternation observed across the heritage sites aligns well with their spatial logic and highlights the importance of targeted soundscape management.

Visitor survey: soundscape gradients, perceptual appraisal and dimensional associations

Pre-analysis checks confirmed instrument adequacy, with Cronbach’s α = 0.816, KMO = 0.740 and Bartlett’s test P < 0.001 all exceeding standard thresholds for reliability and factorability (Supplementary Information).

Semantic-differential ratings ( − 2 to +2) across eight adjective pairs (derived from both the survey and relevant literature) were analysed for the four functional zones. Dimensionality reduction identified two coherent thematic clusters (Fig. 7).

Fig. 7: Semantic Differences in Soundscape Perception Across Four Functional Zones.
Fig. 7: Semantic Differences in Soundscape Perception Across Four Functional Zones.The alternative text for this image may have been generated using AI.
Full size image

Semantic-differential profiles of soundscape perception across four functional zones: (a) calmness, gentleness and pleasantness (S1, S2, S4) and (b) vibrancy, eventfulness, uniqueness, religiousness and traditionality (S3, S5–S8). Yongle Palace (solid line), Yungang Grottoes (dashed line); unpleasant–pleasant (blue), chaotic–calm (light blue), shrill–gentle (grey), monotonous–vibrant (red), uneventful–eventful (yellow), common–unique (green), non-religious–religious (brown), modern–traditional (purple).

In Fig. 7a, scores on chaotic–calm, shrill–gentle and unpleasant–pleasant (S1, S2, S4) increased steadily along the visitor route, peaking in the experience-and-rest areas. The place where crowd noise and announcements dominated scored lowest in terms of visitor satisfaction and perceived cultural distinctiveness, highlighting the need for acoustic mitigation. Experience-and-rest areas, characterised by natural ambient sounds and periodic bell chimes, received the highest ratings for comfort and cultural distinctiveness, representing the most harmonious and tranquil acoustic environments. The Yungang Grottoes’ entrance and main tour areas were perceived as notably calmer than those of Yongle Palace.

Figure 7b shows the second cluster: monotonous–vibrant, uneventful–eventful, common–unique, nonreligious–religious and modern–traditional (S3, S5–S8). These attributes peaked in the main tour zones of both sites before tapering off, indicating that these cores were perceived as the most vibrant, meaningful, distinctive, religious and traditional. Main tour and performance zones benefited from guide commentary, traditional music and live shows, elevating mean scores to nearly +1, indicating gentler and more culturally resonant soundscapes. On the vibrancy scale, Yongle Palace outperformed Yungang Grottoes.

Figure 8 shows that cultural understanding received the lowest visitor rating (R5 = 2.89), followed by sound-environment evaluation (R4 = 3.01) and place experience (R3 = 3.16). These relatively modest scores highlighting opportunities for deeper integration between cultural heritage and tourism. In contrast, higher scores were observed for distinctiveness (R6 = 3.21) and historical–cultural atmosphere (R7 = 3.54), which aligned with a favourable rating for experience intention (R1 = 3.25). The highest score was attributed to environmental evaluation (R2 = 3.84), reflecting the success of ongoing ecological conservation measures at the sites.

Fig. 8
Fig. 8The alternative text for this image may have been generated using AI.
Full size image

Descriptive statistics of tourists’ subjective evaluations of the historical–cultural heritage site.

Figure 9 presents consistently positive visitor feedback. On the five-point scale, the highest scores were awarded to perceptions of cultural-heritage preservation (H4 = 4.36) and religious ambiance (H3 = 4.07). Psychological responses were similarly strong: psychological restoration, overall satisfaction and pride and reverence all scored above 4.0. The main sources of visitor discomfort were identified as noise-buffer lapses (N3 = 3.36), PA systems and horns (N2 = 3.50) and visitor chatter/children’s play (N1 = 3.61). The scores for experience–expectation fit (L1 = 4.04), revisit intention (L2 = 3.87) and recommendation likelihood (L3 = 4.30), in particular, being all above the midpoint.

Fig. 9
Fig. 9The alternative text for this image may have been generated using AI.
Full size image

Descriptive statistics of tourists’ experiential evaluations of the cultural heritage site.

Figure 10 presents a Spearman correlation heatmap demonstrating strong within-domain coherence. Among the eight soundscape variables, items S4 to S8 form a tight cluster (r = 0.52–0.78, p < 0.01), whereas the ‘chaotic/ shrill’ pair (S1–S2) shows only weak correlations with S3, S7 and S8. Subjective evaluation items also intercorrelate, although experience intention (R1) is relatively peripheral. Within the tourist experience block, the cultural-religious (H1–H4), emotional (E1–E3) and loyalty (L1–L3) dimensions are mutually reinforcing. In contrast, noise-annoyance scores (N1–N3) showed a slightly negative but statistically nonsignificant trend against loyalty indicators.

Fig. 10
Fig. 10The alternative text for this image may have been generated using AI.
Full size image

Spearman correlation heatmap of all variables.

Cross-domain correlations underscore the critical role of acoustic comfort and cultural cues. Noise perception (S1, S2) correlates positively with heritage and emotional response scores (H1–H4, E1–E3; r = 0.34–0.49). All loyalty measures (L1–L3) exhibit significant correlations with every soundscape dimension (r = 0.38–0.55), suggesting that a favourable acoustic environment fosters repeat visitation. Positive evaluation variables (R1–R7) align strongly with heritage perception (H1, H4; r = 0.58–0.61) and historical–cultural atmosphere (R6, R7). Notably, the religious ambience item (H3) is especially linked to the historical–cultural atmosphere (r = 0.67). Finally, stronger psychological restoration and satisfaction (E1–E2) are associated with higher cultural and environmental ratings. Loyalty increases most sharply when interaction expectations (R3) and perceived distinctiveness (R6) are met.

Modelling the soundscape–experience mechanism

PCA (KMO = 0.740; Bartlett p < 0.001) consistently reduced the eight semantic-differential adjectives into two latent dimensions across all four functional zones. The first dimension, Sc, is defined by the attributes vibrant, eventful, unique, religious and traditional, with item loadings between 0.71 and 0.84. Eigenvalues for this component vary from 3.34 (main tour zone) to 2.75 (entrance/exit), explaining 34.4%, 38.0%, 41.4% and 41.7% of the variance in the entrance/exit, main tour, performance and rest zones, respectively. The second dimension, quietness & comfort (Sq), is characterised by the items calm, gentle and pleasant, with loadings between 0.68 and 0.79 and eigenvalues from 2.41 to 2.01. This dimension accounts for an additional 30.0%, 25.1%, 20.5% and 27.3% of variance across the same sequence of zones. Internal reliability for both factors exceeds α = 0.81 in all areas. The factor structure remains highly stable, with one exception: in the performance area, the item ‘pleasantness’ (S4) loads onto the Sc factor rather than Sq, suggesting that visitors in this zone may assess comfort based more on cultural stimuli than on reduced noise levels.

When all 417 measurement points are plotted onto the Sc–Sq plane, a distinct diagonal gradient emerges. Points from the entrance/exit nodes cluster in the low-culture/low-comfort quadrant, progressing upwards through the culturally rich main tour and performance areas and culminating in a high-culture/high-comfort cluster within the vegetated rest areas. This trajectory mirrors the qualitative visitor experience: noisy threshold → cultural peak → quiet sanctuary. The findings offer a quantitative foundation for targeted soundscape optimisation. Zones where Sc peaks but Sq lags (e.g., the performance square) or where both dimensions are low (entrance/exit) may benefit most from acoustic and environmental enhancement interventions (Table 3).

Table 3 Summary of factor structure for soundscape perception, subjective evaluation and tourist experience scales

When projected onto the latent axes of Sc and Sq, visitors’ perceptual scores reveal a clear spatial gradient in acoustic character (Fig. 11). The entrance/exit sector occupies the lower-left quadrant: Sc values scatter widely but average slightly below zero, while Sq remains narrowly negative. This indicates that the threshold space is both culturally diluted and only modestly comfortable—a pattern largely attributable to crowd chatter and public address systems rather than heritage-related cues. Progressing along the visitor route, the main tour corridor and performance square shift decisively into positive Sc territory, signifying a stronger perception of cultural events. However, they diverge on the Sq axis. The main tour area hovers near the midline of comfort, whereas the performance clusters register lower Sq values, reflecting amplified shows and congregating audiences trading tranquillity for spectacle. At the route’s terminus, the experience/rest area anchors the upper-right quadrant with the highest Sc peak and consistently positive Sq values. This area, marked by natural ambience and bell chimes, merges cultural richness with acoustic serenity. This diagonal progression provides the empirical foundation for the subsequent SEM, which traces how Sc and Sq influence the four visitor experience dimensions identified earlier.

Fig. 11: Principal component analysis visualisation and kernel density plots of different areas.
Fig. 11: Principal component analysis visualisation and kernel density plots of different areas.The alternative text for this image may have been generated using AI.
Full size image

(entrance and exit service area (green), main tour area (red), performance area (yellow), experience and rest area (brown) and overall site (blue)).

Constructing the structural equation model that links perception, evaluation and experience first requires verifying model fit through confirmatory factor analysis (CFA). Based on the dimensions identified in the exploratory factor analysis, a CFA model was developed to examine the average variance extracted (AVE) and composite reliability (CR) for each dimension. Standardised factor loadings were calculated through the CFA model, and the criteria of AVE ≥ 0.5 and CR ≥ 0.7 were used to assess convergent validity and CR33.

Detailed factor loadings, modification indices and intermediate fit statistics are provided in Supplementary Information for reference. The CFA results support the structural validity of the relationships between the observed and latent variables. Overall, all observed variables demonstrated good convergent validity.

Based on these findings, the following hypotheses were proposed for the cultural heritage scenic area model:

MA: Soundscape perception(Sc, Sq) significantly influences the tourist experience.

MB: Subjective evaluation(Rc, Re) significantly influences tourist experience.

Detailed hypotheses are presented in the Supplementary Information. All hypotheses collectively form the initial theoretical model of the conceptual path diagram (Fig. 12).

Fig. 12
Fig. 12The alternative text for this image may have been generated using AI.
Full size image

Initial Conceptual Path Diagram.

The refined structural equation model, estimated after minor MI-guided residual correlations, achieved an excellent fit (χ²/df = 1.50, CFI = 0.91, TLI = 0.90, RMSEA = 0.034; see Table 4 for core indices). All latent constructs met the reliability/validity thresholds (CR ≥ 0.75, AVE ≥ 0.50); thus, the model was retained for hypothesis testing.

Table 4 Key goodness-of-fit indices for the final SEM

Seven paths were found to be statistically significant (Fig. 13). Among them, the effect of Sq → Te was the strongest positive relationship (β = 0.54, p < 0.001) and Sq also reduced noise-related annoyance (Tn) (β = –0.19, p = 0.006). These findings highlight the particular importance of creating and maintaining a tranquil acoustic environment in cultural heritage management, as it enhances visitors’ deep experiential engagement while mitigating negative disturbance. The effect of Re → Te was even more pronounced (β = 0.85, p < 0.001), underscoring the importance of the physical environment. Cultural value (Rc) exhibited a moderate effect on Tc (β = 0.32, p = 0.002). In addition, both Sc and Sq contributed to tourist loyalty (Tl), with standardised coefficients of (β = 0.24 and 0.18, respectively; p < 0.05). This indicates that the soundscape constitutes a core attraction of cultural heritage sites, and that enhancing acoustic quality can directly strengthen tourist loyalty. Collectively, the model explains 63% of the variance in Te, 51% in Tc, 42% in Tl and 36% in Tn.

Fig. 13
Fig. 13The alternative text for this image may have been generated using AI.
Full size image

Structural equation modelling diagram of cultural heritage soundscape perception, subjective evaluation and tourist experience. Bold lines indicate significant paths.

Qualitative mechanisms of how soundscapes influence visitor experience

To examine the perceptual gap identified between Sc and Sq, we analysed the interview transcripts using grounded theory procedures, providing a qualitative dimension that complements and verifies the quantitative findings. Open coding generated 78 independent concepts, axial coding organised these into 12 subcategories and selective coding distilled them into six saturated core categories (A1–A6). Selected excerpts from the interview data are provided in Supplementary Information.

The resulting framework spans the full experience chain. Environmental soundscape perception (A1) captures how visitors identify and appraise ambient sounds, whereas the historical–cultural atmosphere (A2) reflects the desire to preserve heritage-specific acoustic cues. Emotional and psychological reactions (A3) traces moods and restorative outcomes triggered by those cues. Spatial perceptions of sound are expressed through regional characteristics and sequential order (A4), linking acoustic change to wayfinding and memory. Management concerns are grouped under sound management and experience optimisation (A5), including noise buffering, activity control and curated playback, while touring guidance and conveyance (A6) highlights the role of sonic signage and media in shaping visitor flow. These six dimensions collectively explain how a heritage soundscape moves from sensory input to emotional resonance and, ultimately, to actionable management strategies.

The grounded theory analysis (Fig. 16a) shows that visitors interpret the environmental soundscape through three interwoven perspectives. Background sounds, particularly birdsong, wind and water, form the core acoustic atmosphere. The participants described these elements as ‘calm’, ‘comfortable’ and even ‘restorative’.

Alongside the ambient background, iconic sounds and the contrast between quiet and active zones shape cultural immersion. Iconic sounds, such as bells, ritual drums and chanting, act as auditory symbols. They mark historical moments, anchor local identity and carry collective memory. Visitors move between whispering zones and active zones, which turns spectators into cultural participants.

Heritage soundscapes, together with the broader historical, cultural and religious atmosphere, create an auditory field that deepens visitor engagement (Fig. 16b). Carefully designed sounds transform intangible cultural elements into contemporary multimedia experiences (b53). Figure 14 illustrates the timbral variations across different scenes. Simple timbres and melodic cues suit secular historical areas, while iconic effects, such as bell strikes and chanting loops, enhance the solemnity of sacred spaces (a264). Gradually increasing volume within the grottoes, combined with cave resonance, further strengthens immersion. Sounds also reinforce cognitive recognition (b154, b72). Importantly, the gradual change in volume corresponds to spatial layering (a158), ensuring that acoustic cues are closely integrated with architectural structures and maintaining a coherent, sacred atmosphere.

Fig. 14
Fig. 14The alternative text for this image may have been generated using AI.
Full size image

Composition of cultural heritage soundscapes and atmosphere.

Figure 16c highlights two visitor-centred functions of the soundscape: novelty and attraction with emotional regulation. Recreated historical acoustics guide visitor movement and reshape temporal perceptions. Audiovisual integration (a171) greatly enhances novelty, stimulates exploration motivation and increases satisfaction. At a deeper level, much of the perceived value of heritage sites comes from the emotions they elicit (a303). Cultural archetypal cues rooted in collective memory support this positive effect. In contrast, discordant modern noises cause cognitive dissonance and reduce emotional engagement.

Soundscape perception is closely linked to the heritage site’s physical environment (Fig. 16d). Enclosed grotto spaces enhance the solemnity of rituals through sound reflections, whereas open lawns allow sound to disperse, creating a relaxed and tranquil atmosphere. Functional zoning further reinforces this contrast. Buildings and wayfinding signage are located at entrances, core meditation areas require strict noise control and performance zones achieve optimal effect only when music resonates with architectural forms. Visitors thus experience a carefully designed acoustic gradient (a159) This gradient creates the sense of a ‘dialogue with history’ described by many visitors (Fig. 15).

Fig. 15
Fig. 15The alternative text for this image may have been generated using AI.
Full size image

Constructing immersive touring paths through soundscape spatial sequences.

Figure 16e highlights the management dimension of heritage soundscapes, integrating noise control with broader site management and communication practices. Grounded theory coding identified a ‘perceptual threshold’; when environmental noise exceeds this limit, visitors suddenly lose the sense of historical context (a227). Subtle cues (a181) can guide behaviour without disrupting immersion.

Fig. 16: Frequency heat map of grounded theory conceptual codes.
Fig. 16: Frequency heat map of grounded theory conceptual codes.The alternative text for this image may have been generated using AI.
Full size image

(a) Environmental soundscape perception; (b) Historical–cultural atmosphere; (c) Emotional and psychological reactions; (d) Spatial perceptions of sound; (e) Sound management and experience optimization; and (f) Touring guidance and conveyance. “PA” refers to the Yungang Grottoes, and “PB” refers to Yongle Palace. The vertical axis represents the conceptualized results derived from grounded theory.

Auditory cues, such as bells, drums, music in specific zones and guided commentary, function as navigational signals that complement signage systems. They help direct visitor flow smoothly while minimising disruptive alerts (Fig. 16f). Pre-visit media often set expectations for auditory experiences (a259), and a high consistency between these expectations and the on-site soundscape can increase revisitation intention and encourage word-of-mouth and social media sharing. Thus, purposeful sound design simultaneously guides, interprets and promotes the heritage experience at an integrated level.

Using grounded theory analysis, the interview data were integrated into a process model showing how soundscapes shape the visitor journey (Fig. 17). The process begins with A1 – visitors’ initial perception of ambient and iconic sounds, ranging from temple bells to marketplace noise. When these cues carry clear cultural meaning, they directly influence A2, enhancing immersion and awe. Emotional responses are modulated through A3, either evoking curiosity and pleasure or triggering irritation. The spatial framework is captured by A4. Enclosed grottoes and open lawns create deliberate acoustic gradients, intensifying or softening the historical atmosphere. Noise control, timing strategies and carefully selected soundtracks (A5) maintain acoustic thresholds that support immersion, while A6 uses auditory cues to guide movement and set behavioural norms. Together, these mechanisms provide a coherent framework for designing soundscapes that enhance visitor satisfaction, loyalty and heritage value.

Fig. 17
Fig. 17The alternative text for this image may have been generated using AI.
Full size image

Path model of soundscape impact on tourist experience.

This study employs methodological triangulation, which allows researchers to examine the same phenomenon through different research paradigms to integrate quantitative and qualitative findings. In cross-method studies, the limitations of one approach often become the strengths of another.

The triangulation of survey indicators, SEM paths and grounded theory coding (Figs. 18, 19) indicates that acoustic quietness (Sq) is the primary emotional trigger. In the experience and rest areas where LAeq falls below 47 dB, Sq exhibits a significant positive path to Te in the SEM, which is further supported by qualitative evidence from visitors’ emotional evaluations (A3). Interviews further revealed that lower auditory interference allows visitors to focus on cultural cues, transforming calmness into feelings of pride and awe. A quiet soundscape functions as an ‘emotional entry point’, amplifying emotional rewards whenever external noise is blocked.

Fig. 18
Fig. 18The alternative text for this image may have been generated using AI.
Full size image

A triangulation design for conceptualizing and empirically examining the dimensions of soundscape influences on tourist experience.

Fig. 19
Fig. 19The alternative text for this image may have been generated using AI.
Full size image

Triangulated analysis of soundscape spatial sequence influencing tourist experience.

In contrast, Sc drives cognitive transformation. Specific sounds, such as Buddhist chanting or traditional music, enhance cultural value (Rc) and thereby strengthen historical–cultural experience (Tc). Qualitative analysis further confirms that landmark sounds enhance visitors’ cultural immersion, and the effect of environmental appraisal on cognitive experience in the SEM is supported by observations of spatial atmosphere (A4). Qualitative evidence shows how ritualised acoustic effects evoke historical associations and how sound-based media sustain immersion, outlining a trajectory from auditory stimuli to heritage engagement. Both sites structure this process through a progressive sound-pressure narrative, from loud thresholds to soft core areas, and then to gradually amplified resonance, effectively reflecting a ‘introduction–climax–reverberation’ narrative arc.

Qualitative analysis further reveals complexities not captured by the SEM —Functional zones modulate these mechanisms in distinct ways: Entrance areas are primarily influenced by crowd noise and broadcast prompts, reducing Sq and increasing Tn. Core tour corridors employ commentary and ritual music to synchronise audiovisual cues, increasing Sc and enhancing Tl. Performance squares combine high-decibel sounds with cultural content, where the latter offsets the former, maintaining high Te. Rest areas integrate natural ambiance with deep temple bells, simultaneously strengthening the effects of Sq → Te, Sc → Tl and Rc → Tc, demonstrating how religious soundscapes can evoke profound historical associations while providing cognitive relief (Table 5).

Table 5 Triangulated evidence linking soundscape factors to visitor experience

Discussion

This discussion examines how soundscape findings at the Yungang Grottoes and Yongle Palace heritage sites influence visitor experiences, highlighting the need for a comprehensive exploration of acoustic environments. The study identifies two soundscape influence factors (Sc and Sq), which align with the two-dimensional descriptors proposed by Axelsson and Cain34. The pathways through which soundscapes affect visitor experience also fit the dual-axis ‘Sc–Sq’ framework and correspond closely with the grounded theory–derived paths; visitor experience is shaped by physiological and psychological comfort as well as cultural-symbolic narratives, and these effects are manifested emotionally.

The physiological effects of Sq on visitors are primarily related to noise exposure. At the studied heritage sites, major noise sources in the main entrances and core tour areas include public announcements, vehicle traffic and visitor activity. Prolonged exposure along visitor routes can pose health risks and elevate stress levels35, thereby affecting the overall tour experience.

Visitors’ psychological responses are also influenced by Sq. The psychological impact of noise is immediate and cumulative, and it can temporarily affect mood and behaviour36. Quiet environments not only reduce annoyance but also enhance psychological restoration, attention and emotional openness, enabling visitors to engage more fully with cultural meanings. Studies in European religious architecture, such as Gothic cathedrals and monasteries, indicate that quietness is a core medium for sacredness and emotional immersion37. The psychological mechanism of Sq is particularly evident in visitor perceptions across the four soundscape zones. Low sound-pressure environments, such as rest areas combined with natural ecology, significantly enhance visitor experience, consistent with attention restoration theory38. Conversely, noise-related experiences (Tn) stem from spatial stress caused by high visitor density and acoustic interference in tour areas39. These results are consistent with international research, indicating that soundscape effects on visitor experiences may have cross-cultural commonality. Notably, due to China’s high population density, such challenges persist year-round, highlighting the urgent need for improved sound environment management40.

The emotional activation mechanism of Sq is supported by methodological triangulation. Quiet soundscapes create cognitive space and reflective mindsets, enabling emotional resonance41. The study also found that although visitors perceived higher noise levels in performance areas, immersion in cultural performances often allowed them to tolerate the noise. This indicates that cultural experience can mitigate the negative effects of high sound levels, consistent with the moderating role proposed in van den Bosch et al.’s42 soundscape cognition framework. Visitors’ expectations of cultural scenes activate cognitive regulation, reducing the perceived negativity of physical sound pressure43. Triangulation with qualitative analysis further suggests that improving soundscape quality is more effective than merely reducing noise44. On the one hand, rich and varied soundscapes elicit short-term pleasure through emotional contagion; on the other hand, background soundscapes establish deeper emotional resonance through associative connections45.

Compared with the emotional regulation mechanism of Sq, Sc functions more as a driver of cognitive transformation and cultural narrative. Interview data indicate that symbolic sounds—such as bells, chanting, ritual drums, guided commentary or heritage sounds with cultural significance—not only convey information but also form an audible cultural text. These sounds prompt visitors to transform sensory experience into cultural understanding, historical association and identity recognition. Although Yungang Grottoes and Yongle Palace differ greatly in scale, layout and cultural atmosphere, both sites exhibit highly consistent soundscape mechanisms. At key nodes, symbolic sound sources reinforce cultural meaning. Consequently, cultural perceptions of the soundscape can be converted into historical and cultural experiences, supporting Jeon and Jo’s theory on sensory experience transformation. The role of heritage soundscapes in shaping cultural identity, eliciting emotions and influencing behaviour aligns with theories of place attachment12. Visitors’ evaluations of Rc and Re and their experiences of Tl and Tc indicate that subjective evaluation, emotion and experience together form an essential component of service satisfaction and affect visitor behaviour46.

The transformational pathway of Sc reveals a narrative-driven mechanism. Background sounds often evoke anticipated emotions47, while iconic soundscapes accurately convey the characteristics of cultural heritage48, directly stimulating visitors’ associative evaluations of the cultural atmosphere. Heritage soundscapes themselves constitute intangible cultural assets, acting as nonverbal, nonvisual carriers of culture. By reconstructing vanished cultural scenes, these soundscapes enable a ‘time-travel’ experience, significantly enhancing the depth of visitor immersion49. This finding aligns with a soundscape study in a Turkish museum, where areas designed with sound as an element made visitors feel as if they were present in specific historical periods, resulting in a more engaging experience50. Such soundscape-driven experiences form a core attraction of cultural tourism51, providing visitors with irreplaceable novelty52. This cross-cultural consistency further supports the broad applicability of the ‘Sc–Sq’ dual-axis framework for understanding how heritage soundscapes influence visitor experience, although its generalisability to other types of heritage sites remains to be verified.

Acoustic and cultural evaluations also positively influence visitors’ loyalty27,28. Visitor loyalty reflects the behavioural dimension of experience and is mediated by satisfaction and emotional experience (Te), both of which are closely linked to actual behaviours of guidance, engagement and sharing53. Participatory experiences at heritage sites have a significant positive impact on loyalty54.

Additional interesting findings emerge. Experiences of historical culture and folk religion are closely associated with religious heritage sites. In China, population size and the number of religious sites have a significant positive effect on the distribution of cultural heritage55. The cultural soundscape of heritage sites, combined with folk experiences, has become part of visitors’ pursuit of novel cognitive engagement, confirming Zhang’s findings on the influence of religious architectural soundscapes on visitors37.

The study further reveals a spatial correspondence between soundscape sequences and tourist routes. Changes in sound pressure levels and shifts in cultural symbols guide visitor perception, creating focal points of sensory experience at key nodes. This finding aligns with Kou et al.’s observations regarding the contingency, temporality and heterogeneity of sound, visitors and context56. In both heritage sites examined, the soundscape sequence – ‘high noise at the entrance—sharp drop in the core cultural area—gradual recovery in the experience zone’ – demonstrates that an orderly spatial arrangement of sound enhances the sense of transition between different areas57. This sequence-based design, guided by volume gradients and key spatial nodes, produces a soundscape narrative structure similar to ‘prelude—climax—resonance’.

Different spatial sequences yield differentiated experiences. At the entrance and exit service areas, noise reduction is the primary requirement. Soundscape evaluation is significantly influenced by noise sensitivity58, suggesting the use of directional broadcasts and architectural treatments to reduce hard reflective surfaces, suppress background noise and establish initial tranquillity for subsequent cultural engagement. In main tour areas, prolonged stays may lead to chronic noise fatigue, disrupting cultural immersion, while soundscape complexity can amplify negative emotional accumulation59. Adjusting visitor flow and minimising unnecessary artificial sound sources can enhance Sq and prevent acoustic complexity from diminishing cultural immersion. Performance areas in most heritage sites face the challenge of ‘high noise during performances – monotony during intervals’. Planned sound design, including control of sound source directionality, performance rhythm and volume limits, can create continuous cultural narratives while balancing cultural continuity and acoustic load. In experience and rest areas, homogeneous soundscapes tend to cause emotional blunting. Differentiated sound designs in these spaces can reinforce emotional activation mechanisms.

In summary, this study maps the spatial logic of heritage soundscapes, distils core experience dimensions and constructs a causal model linking these dimensions. The main findings are as follows:

(1) A distinctive four-zone narrative arc. At the two heritage sites under study—the ruin-type Yungang Grottoes and the architectural ensemble of Yongle Palace—the soundscapes align with four functional zones along visitor routes: entrance/exit areas, main tour areas, performance areas and experience/rest areas. This spatial sequence forms a distinctive auditory storyline. Gradients of sound pressure levels and cultural cues create a ‘prelude → climax → resonance’ narrative, providing new evidence for incorporating a narrative framework into soundscape studies.

(2) Two acoustic levers driving four visitor experience outcomes. At the studied heritage sites, visitor responses can be categorised into four domains: loyalty (behavioural level), sensory experience (Tn), historical-cultural cognition (Tc) and emotional experience (Te). These experiences are driven by Sc and Sq, with Sc eliciting cultural associations and Sq facilitating emotional regulation.

(3) Path strength and cognitive amplification. Sc significantly enhances loyalty, whereas Sq reduces sensory discomfort (Tn) and strengthens emotional response. Perceived cultural value (Rc) amplifies both loyalty (Tl) and cognition (Tc), while environmental evaluation (Re) enhances loyalty (Tl) and emotional experience (Te).

(4) Dynamic influence chain. Six interacting themes – environmental perception, cultural atmosphere, emotions, spatial sequence, management optimisation and behavioural guidance – form a ‘soundscape → cognition → behaviour’ loop. The dual-mechanism model proposed here shows how tranquillity activates emotions and how cultural eventfulness drives experience transformation. This framework offers a new theoretical lens for understanding the coupling of soundscapes and heritage narratives, explaining both the consistent and complementary experience processes observed in the two cultural landscapes.

This study clarifies how soundscapes shape visitor experiences in heritage sites but has several limitations that restrict its generalisability. First, all participants were Chinese, and the interpretation of sound descriptors may differ across cultural backgrounds; therefore, the findings may not extend to cultures with different associations or meanings of sound. Second, the field survey was conducted only during the autumn off-season, leaving peak-season dynamics unexamined. Third, the study focused on religious and historical-cultural heritage, so the results may not fully apply to industrial or archaeological heritage, which lacks symbolic sounds. The findings and framework need to be validated in other heritage contexts. Future research should incorporate a broader range of environmental cues. Expanding to cross-cultural samples, monitoring seasonal variations and exploring soundscape attributes in diverse heritage types would help develop a more universally applicable framework for heritage soundscape management.