Introductionh

This study investigates whether Artificial Intelligence Generated Content (AIGC) can produce credible visualizations of an existing cultural-heritage interior and identifies the kinds of distortions that emerge when such images are benchmarked against a verifiable reference. Specifically, we generate AIGC images for Juanqinzhai (Studio of Exhaustion from Diligent Service) in the Forbidden City, Beijing, and compare them with a SketchUp(SU) model derived from terrestrial laser scanning (TLS) and a historical reference corpus of manuals, drawings and repair records, in order to quantify geometric deviations and evaluate cultural-functional coherence.

In recent years, digital technologies have increasingly transformed the field of architectural heritage conservation. Techniques such as Historic Building Information Modeling (HBIM), Light Detection and Ranging (LiDAR), and photogrammetry are now well-established for generating high-precision geometric measurements and structural reconstruction1. For example, Chen et al.2 proposed an HBIM method integrating multiple digital techniques for timber ancient buildings, achieving accurate form and structural modeling that underpin scientific conservation and digital archiving of heritage2. Similarly, the study How Architectural Heritage Is Moving to Smart uses bibliometric analysis of HBIM literature to show that semantic annotation, digital twin integration, and visualization are becoming dominant trends3; yet, deployment in resource-constrained or regionally localized heritage projects remains limited4. While photogrammetry, TLS/LiDAR and HBIM techniques have reached high precision in geometric documentation5,6 and have successfully captured ornamental details through dense point-cloud modeling (Remondino and Campana7; López et al.8; Brumana et al.9). Given this established capacity for precision, the present study explores a complementary and efficient pathway for the initial phases of heritage visualization. We investigate whether AIGC can serve as an auxiliary tool to rapidly generate diverse and atmospheric visual interpretations with minimal input, relying solely on textual or visual prompts. This approach is not intended to replace high-fidelity surveying but to bridge a different gap: between the measured “as-built geometry” and the experiential “cultural-semantic visualization” of interiors—encompassing aspects like lighting, ornament logic, and spatial symbolism—during conceptual exploration.

To empirically investigate this complementary pathway, we focus on Juanqinzhai—a quintessential Qing imperial interior within the Forbidden City, Beijing. This study generates AIGC images for the site and benchmarks them against a high-fidelity SU model derived from TLS and a historical reference corpus. Our goal is to quantitatively assess geometric deviations and trace their roots to cultural biases in training data, ultimately proposing a human-AI collaborative workflow for culturally sensitive digital heritage reconstruction. Archival Sources. Primary materials consulted for this research include the Qing Shi Yingzao Zeli (清式营造则例), Yingzao Fayuan (营造法原), architectural drawings by Yangshi Lei10, and Qing-period repair records from the Palace Museum11. These sources provide normative design detail and historical construction/maintenance data against which AIGC outputs can be benchmarked.

Juanqinzhai, located within the Palace of Tranquil Longevity (Ningshou Gong) of the Imperial Palace, is one of the core palatial retreats commissioned during the Qianlong era (Fig. 1). Its interior embellishments are widely acknowledged as pinnacles of Qing palace design and craftsmanship. The decor of Juanqinzhai extensively features bamboo-thread marquetry, rosewood carving, inlaid jade elements, double-sided embroidery, and panoramic trompe-l’œil murals. Spaces such as the theater hall and viewing pavilion are particularly sumptuous: the screens (gẹshàn), suspended canopies (zhuàguà), and caisson ceilings (zǎojǐng) bear intricate patterning, vivid coloration, and material richness that are rare among palace interiors. Due to its relatively preserved original decorative details and availability of archival documentation, Juanqinzhai serves as an ideal empirical case for studying ornamental richness and spatial atmosphere in palace interior reconstruction.

Fig. 1: Location and analytical plan of Juanqinzhai.
Fig. 1: Location and analytical plan of Juanqinzhai.
Full size image

a The location of the Ningshou Palace Garden within the Forbidden City, Beijing. b The precise location of Juanqinzhai. c Analytical plan of Juanqinzhai, highlighting the functional division between the Eastern Five Bays (residential/ritual zone) and the Western Four Bays (entertainment/theatrical space).

On this foundation, the present study aims to establish and validate a “Critical Generation” framework to overcome the limitations of general AIGC models in reconstructing cultural heritage interiors at a holistic spatial level. Specifically, this research uses a high-fidelity SU model of Juanqinzhai as ground truth to quantitatively assess deviations in proportion and spatial layout produced by AIGC outputs12; trace the root causes of these deviations, especially cultural cognition biases embedded in training data (including Orientalist aesthetics, style vs. identity conflation, neglect of temporal/regional distinctions); and propose a human-AI collaborative workflow that integrates AIGC’s creative capabilities with historical documentation, structural accuracy, and semantic fidelity13. The contributions of this study include: the first incorporation of a Critical Heritage Studies perspective into AIGC evaluation; the first quantification of holistic spatial proportion and cultural semantic deviations; and using the highly ornate interior of Juanqinzhai as a case study that achieves both theoretical depth and practical applicability.

Methods

Study design overview

This study is designed to evaluate the capability of AIGC in reconstructing Qing imperial interiors through a structured, benchmarked approach. The workflow consists of three core components: (1) Data Foundation: establishing geometric and semantic benchmarks using a high-fidelity SU model and a historical reference corpus; (2) AIGC Experiment: generating over 200 images for the eastern (residential) and western (theatrical) zones of Juanqinzhai using function-oriented prompts on two mainstream AIGC platforms; (3) Bias Analysis: conducting a multi-dimensional evaluation of the AIGC outputs against the benchmarks, focusing on geometric accuracy, cultural semantics, and functional consistency. This design ensures a systematic and verifiable assessment of AIGC’s potential and limitations.

Data foundation: reference model of Juanqinzhai and historical semantic framework

The data foundation of this study is constructed through the integration of a high-fidelity SU reference model of Juanqinzhai and a multi-dimensional historical semantic framework14,15. The SU model provides the geometric “ground truth” for quantifiable measurements of spatial structure and scale16, while the semantic framework establishes interpretive references in terms of decorative logic, cultural context, and functional zoning. Together, these dual benchmarks allow for a systematic analysis that encompasses both quantitative assessment and qualitative interpretation.

The SU model was developed by synthesizing architectural survey data17, Yangshi Lei design archives, Qing court repair records, and on-site photographic documentation. Its construction followed a three-tier calibration principle of “spatial—component—detail.” The SU reference model was constructed from TLS point-cloud data and archival drawings, with a point spacing resolution of ≤2 cm, thus ensuring its validity as the geometric ground truth for comparative analysis. At the spatial level, the nine-bay façade and three-bay depth of the layout were determined, with axial lines, column grids, and modular step dimensions precisely aligned to ensure reliable proportions (Fig. 2). At the component level, the reconstruction focused on the core architectural elements of the eastern five bays and western four bays: the eastern residential zone emphasizes luodi zhao (落地罩, floor-standing decorative partitions), kang zhao (炕罩, heated-bed enclosures), and bisha chu (碧纱橱, gauze cabinets), which reinforced privacy and ritual solemnity; while the western entertainment zone highlights the indoor stage, yuanguang zhao (圆光罩, circular screen partitions), and panoramic tongjing hua (通景画, perspective wall paintings), shaping the performative atmosphere (Fig. 3). At the detail level, micro-scale features such as the diameter of the caisson ceiling (zaojing 藻井), the proportional height of hanging brackets (gualuo 挂落), and the lattice division of window panels were carefully calibrated to ensure proportional accuracy and consistency of structural logic (Fig. 4). Through this hierarchical calibration, the SU model functions not merely as a static 3D visualization, but as a verifiable research benchmark supporting geometric comparison, functional validation, and semantic reference in subsequent experiments18.

Fig. 2: Three-dimensional spatial model of Juanqinzhai.
Fig. 2: Three-dimensional spatial model of Juanqinzhai.
Full size image

Axonometric cutaway view of the high-fidelity SketchUp model showing the overall spatial layout, structural framework, and interior enclosure of the western four-bay theatrical space and the eastern five-bay residential retreat. The model serves as the geometric reference for prompt construction, proportional analysis, and quantitative comparison with AIGC-generated results.

Fig. 3: Interior models of the western four bays and eastern five bays of Juanqinzhai.
Fig. 3: Interior models of the western four bays and eastern five bays of Juanqinzhai.
Full size image

a Axonometric view indicating the spatial relationship and functional zoning of the western theatrical space and the eastern residential space. b Sectional perspectives of the two interior units showing enclosure depth, viewing direction, and canopy position used for boundary extraction and prompt parameterization. c In-situ photograph of the eastern five-bay interior providing the real-scene reference for stylistic features, material mapping, and spatial verification.

Fig. 4: Detailed component models of the interior of Juanqinzhai.
Fig. 4: Detailed component models of the interior of Juanqinzhai.
Full size image

Exploded and sectional representations of key timber components identifying mortise–tenon logic, joint locations, and profile curvature for parametric extraction. These component datasets establish geometric constraints, assembly rules, and semantic labels for the AIGC-assisted reconstruction process.

Beyond the geometric benchmark, a corpus of historical reference corpus was employed to establish cultural references. Normative manuals such as Qing Shi Yingzao Zeli19 (《清式营造则例, Qing Construction Standards) and Yingzao Fayuan20 (《营造法原, Building Standards) provided institutional foundations for component hierarchy, color regulation, and material usage, ensuring that generative results can be evaluated within their historical normative framework. In addition, Yangshi Lei drawings and detail sketches from the Qianlong period supplied direct visual references for validating decorative arrangements and component proportions. Modern restoration records further supplemented details of material texture, pattern density, and chromatic layering, offering crucial evidence for semantic interpretation.

Based on these sources, this study constructed a semantic correspondence system linking “spatial function—component type—decorative logic.” For example, in the residential zone, the throne bed (baozuochuang 宝座床) and kang zhao ensemble fulfilled both domestic and ceremonial functions, while luodi zhao created transitional and enclosing spatial layers. In the entertainment zone, the indoor stage, yuanguang zhao, and panoramic tongjing hua reinforced the performative axis and visual centrality, exemplifying the synergy between function and ornament. These semantic correspondences not only reveal the interaction between spatial layout and functional usage but also provide the benchmark for assessing the cultural authenticity of AIGC-generated results.

The combination of geometric and semantic foundations significantly enhances explanatory power: quantitative comparisons with the SU model guarantee scientific measurability of deviation, while semantic interpretation exposes potential deficiencies of AIGC in decorative logic and cultural reproduction. Unlike studies relying solely on visual similarity, this dual-benchmark framework enables a systematic evaluation of the applicability and limitations of AIGC in heritage reconstruction, offering verifiable methodological support for digital restoration of Qing imperial interiors.

Experimental design of AIGC-based spatial generation

The experimental design of this study adopts a function-oriented prompt framework as its core principle, with the aim of evaluating the capability of AIGC in reproducing the overall atmosphere, decorative logic, and functional attributes of Qing imperial interiors21. The AIGC platforms primarily employed are Midjourney (v6) and Stable Diffusion XL. Midjourney was selected for its strong capability in generating diverse and atmospheric overall spatial images, while Stable Diffusion XL, complemented by ControlNet and LoRA modules, was chosen for its enhanced controllability over specific architectural elements. It is important to note that other generative models (e.g., DALL·E 3) may exhibit different tendencies and are beyond the scope of this comparative study. The workflow consists of three stages: prompt construction, model configuration, and generation implementation22,23.

In terms of prompt design, the prompts were not merely regarded as text triggers for image generation, but as explicit semantic constraints24. To capture the functional differentiation of Juanqinzhai, prompts emphasized the two major spatial zones: the eastern five bays (yanqin 燕寝, residential retreat) and the western four bays (entertainment and theatrical space). Keywords such as “residential,” “imperial retiring room,” “viewing opera,” and “indoor stage” were systematically embedded to test whether the model could respond to functional cues (Table 1). Moreover, representative components were explicitly encoded: luodi zhao, kang zhao, and bisha chu for the eastern residential zone; and yuanguang zhao, jituizhao (几腿罩, legged partition), tongjing hua, and stage (xitai 戏台) for the western entertainment zone. Historical context tags such as “Qianlong period, Qing dynasty” and “based on Yangshi Lei archives” were integrated to reduce generic or ahistorical outputs. Decorative style was further constrained by detailed descriptors including “rosewood carving,” “bamboo-thread marquetry,” “lacquer finish,” “jade inlay,” and “perspective panoramic painting,” ensuring that generated results approximated authentic palace interiors rather than generalized “Chinese-style” imagery. Collectively, these elements formed a four-layer prompt structure—spatial function, representative components25, historical context, and rendering mode—providing standardized and comparable outputs.

Table 1 Example prompts for AIGC-based spatial generation of Juanqinzhai interiors

To evaluate the influence of visual conditioning, two AIGC configurations were designed in parallel—a text-only baseline (Fig. 5a) and an image-augmented workflow (Fig. 5b)—both benchmarked against a high-fidelity SU reference model (Fig. 5c). In the text-only branch, images were generated exclusively from descriptive prompts, which frequently resulted in perspective exaggeration, horizontal compression, and inconsistent proportional logic. By contrast, the image-augmented branch incorporated historical photographs and SU-derived perspective renders as visual inputs, providing explicit geometric cues for spatial boundaries, decorative layering, and lighting context. Consistent generation parameters (sampling steps, guidance scale, and controlled random seeds) were maintained across all runs to ensure reproducibility, and negative prompts were intentionally omitted to avoid suppressing intrinsic model tendencies. As shown in Fig. 5, the stage width-to-height ratio of the SU reference model (1.64) compared with 0.98 in the text-only output and 1.38 in the image-augmented version demonstrates that visual prompting improves geometric fidelity and contextual coherence, thereby enhancing the robustness of the proposed workflow.

Fig. 5: Comparative results of text-only and image-augmented AIGC generation for the theatrical interior of Juanqinzhai.
Fig. 5: Comparative results of text-only and image-augmented AIGC generation for the theatrical interior of Juanqinzhai.
Full size image

a text-only AIGC generation; b image-augmented AIGC generation; c SketchUp (SU) reference model. Red bounding boxes indicate the measurement region used to compute width–height ratios. Image-augmented prompting significantly reduces horizontal compression compared to text-only prompting, bringing geometric proportions closer to the SU model benchmark.

During implementation, the eastern and western zones were generated as independent experimental units, with a minimum of 100 images produced for each zone, yielding a dataset of over 200 images in total. For the eastern residential zone, prompts emphasized privacy and enclosure, with focus on baozuochuang, kang zhao, and bisha chu to test whether AIGC could capture solemnity and ritual atmosphere. For the western entertainment zone, prompts highlighted openness and performative centrality, stressing stage layout, yuanguang zhao, and panoramic tongjing hua to examine whether the outputs aligned with theatrical spatial logic. To test the robustness of function–component mapping, “function-confused” prompts were also designed by mixing elements of residential and theatrical spaces. This allowed for the detection of semantic inconsistencies or structurally illogical images, offering critical insight into the model’s limitations in functional understanding.

Bias analysis strategy

To comprehensively evaluate the performance of AIGC in reconstructing Qing palace interiors, this study establishes a multi-dimensional bias analysis framework that integrates geometric quantification, cultural semantics, and functional consistency26. The objective of this framework is to reveal systematic deviations in scale, structural logic, and cultural representation, moving beyond subjective visual impressions toward measurable and reproducible evidence. The complete multi-dimensional evaluation framework, detailing the specific indicators, reference basis, and assessment criteria for each dimension, is summarized in Table 2. The following sections elaborate on the application of this framework.

Table 2 Summary of AIGC-generated images of Juanqinzhai functional units and multi-stage filtering funnel

On the geometric level, the high-fidelity SU reference model of Juanqinzhai, together with historical manuals such as Qing Structural Regulations (Qing shi yingzao ze li), serves as the ground truth27. Key proportional parameters—such as the ratio of stage depth to total room depth, the ratio of caisson dome diameter to bay width, and the ratio of canopy height to column height—were extracted from the reference model28. Measurements were then taken from rectified AIGC images through a four-step workflow: (1) correcting perspective distortion; (2) aligning the rectified image with the TLS-based SU model via architectural anchor points; (3) extracting key architectural ratios using normalized pixel distances; (4) comparing them with SU-derived ratios to identify geometric deviation.Each measurement was repeated three times by independent evaluators, with deviation variance within ±2%, ensuring reliability. As evidenced by the comparative measurements in Fig. 5, the introduction of image-based prompts significantly reduces geometric distortion. Whereas text-only prompting tends to induce horizontal compression, visual anchoring aligns the generative process more closely with structural logic, resulting in a more proportionally stable output.

On the cultural semantic level, a three-dimensional evaluation framework was established: (1) ornamentation, which examines the density of decorative patterns, color saturation, and material texture, comparing them with historical archives and surviving artifacts; (2) structural logic, which assesses whether proportional relations and spatial organization align with functional requirements—for instance, whether the retiring quarters (eastern bays) emphasize enclosure and privacy, while the theater space (western bays) maintains openness and framing effects; (3) contextual coherence, which analyzes potential anachronisms or stylistic confusion, such as inappropriate mixing of dynastic styles or misplaced decorative elements. Each dimension is operationalized through explicit criteria summarized in Table 3, and inter-rater reliability will be tested by involving multiple evaluators to reduce subjectivity (Inter-rater reliability was tested with three independent evaluators, yielding Cohen’s Kappa values ranging from 0.74 to 0.83, confirming substantial agreement across the four dimensions).

Table 3 Multi-dimensional evaluation framework for AIGC-generated reconstructions of Juanqinzhai interiors

Given the clear functional division within Juanqinzhai—eastern bays serving as imperial retiring quarters and western bays as entertainment space—this study also introduces functional consistency as an independent dimension. AIGC outputs were directly compared with the intended Prompt specifications, verifying whether the eastern bays correctly reproduced the enclosure formed by luodi zhao and kang zhao, and whether the western bays generated theater stages, tongjing hua, and spectator zones. The accuracy and confusion rates of function–component mapping were systematically recorded and statistically analyzed to assess AIGC’s ability to comprehend functional–decorative relationships.

Through this integrated framework that combines geometric, semantic, and functional dimensions (Table 3), the study not only identifies the strengths and weaknesses of AIGC in visual expression and decorative rendering but also uncovers the underlying cultural cognition biases embedded in its outputs. This bias analysis framework provides the methodological foundation for the subsequent proposal of a “critical generation” workflow and contributes to the development of culturally sensitive generative models for digital heritage reconstruction.

Results

Quantitative deviations: systematic errors in AIGC outputs

To assess the geometric fidelity of AIGC-generated imagery, the high-fidelity SU model of Juanqinzhai was adopted as the ground truth benchmark. Key proportional parameters were extracted and compared29, revealing that AIGC outputs exhibit significant and systematic deviations—showing consistent directional biases—rather than random noise30 (i.e., unsystematic and unpredictable fluctuations). As summarized in Table 4, the deviation magnitude ranges from approximately 6% to 41%, demonstrating a patterned bias inherent to the generative process. Specifically, AIGC tends to exaggerate depths and decorative proportions while compressing horizontal and upper-storey scales. For instance, the stage depth was overestimated by +39.3%, whereas the stage width-to-height ratio was underestimated by −18.3%. Similarly, the height ratio of upper-to-lower partitions was enlarged by +22.5%, and decorative components such as corridor screens were inflated by +24.4%.

Table 4 Comparison of key proportional parameters between SU model (ground truth) and the AIGC-generated outputs

To further visualize these deviations, direct comparisons were made between the SU model and AIGC-generated outputs across three representative bias categories (Fig. 6).

  1. (a)

    Plan proportion bias—horizontal compression caused by an underestimation of the stage width-to-height ratio (−18.3%), resulting in a visually narrower and more frontal composition.

  2. (b)

    Vertical proportion bias—reduced second-storey elevation (−6.5%) that compresses the upper architectural tier, flattening the façade hierarchy and diminishing the perception of vertical layering essential to Qing timber structures.

  3. (c)

    Decorative component bias—excessive enlargement of ornamental areas (+24.4%), particularly within corridor screens, reflecting a model tendency to prioritize visual richness and decorative density over structural authenticity.

Fig. 6: Geometric comparison between SU benchmark and AIGC-generated results across three representative bias categories.
Fig. 6: Geometric comparison between SU benchmark and AIGC-generated results across three representative bias categories.
Full size image

a Plan proportion bias, showing horizontal compression caused by the underestimation of the stage width-to-height ratio; b Vertical proportion bias, showing reduced second-storey elevation and compressed façade hierarchy; c Decorative component bias, showing the inflation of ornamental areas, particularly corridor screens. Red bounding boxes indicate the measurement regions used for ratio extraction and error quantification (see Table 4). All comparisons are based on rectified front-elevation views aligned with the SU coordinate system.

Overall, these results indicate that AIGC systematically prioritizes atmospheric richness and visual impact at the expense of geometric discipline. The patterned overemphasis on depth and decoration, together with the compression of structural proportions, reveals a spectacle-oriented generative bias that challenges the accurate restoration of Qing interior spatial logic. This finding establishes the empirical basis for the subsequent Historical Calibration workflow aimed at mitigating such distortions.

Tracing bias: the roots of cross-cultural cognitive distortions

The systematic patterns of exaggeration and semantic confusion observed in the AIGC outputs—such as the prioritization of visual spectacle over structural fidelity (e.g., the disproportionate enlargement of the stage roof structure as shown in Fig. 7), and the conflation of distinct historical styles (e.g., the inflated motif density observable in Fig. 8)—invite a critical interpretation that moves beyond technical error. These observed patterns resonate with scholarly critiques of Orientalism31,32, which argue that Western‐centric representations often construct the “East” as exotic and spectacle-oriented33.

Fig. 7: AIGC outputs showing exaggerated enlargement of stage roof structures.
Fig. 7: AIGC outputs showing exaggerated enlargement of stage roof structures.
Full size image

Generated results illustrating the over-scaling of the stage canopy and roof system compared with the column grid and enclosing interior. This distortion reveals the model’s bias toward visually salient features and the loss of architectural proportional hierarchy, providing a basis for subsequent geometric correction and component-level control.

Fig. 8: Exemplary AIGC output showing ornamental amplification and motif densification in the Juanqinzhai interior.
Fig. 8: Exemplary AIGC output showing ornamental amplification and motif densification in the Juanqinzhai interior.
Full size image

AIGC-generated interiors displaying excessive decorative enrichment relative to structural elements. The left panels show reconstructions with amplified carving complexity and multi-layered ornamental brackets, whereas the right panels present typical Chinese traditional motifs serving as semantic sources. The imbalance between structural geometry and surface ornament demonstrates semantic overfitting in generative models and motivates the introduction of hierarchical weighting between geometric accuracy and stylistic expression.

We therefore frame as a working hypothesis that the AIGC reconstruction of Juanqinzhai, to some extent, reproduces a globalised “Oriental palace” imaginary that may be prevalent in its training data. This corpus is potentially skewed towards spectacularised representations drawn from export art, film, and digital media rather than balanced architectural surveys or archival sources.

It is crucial to present this as a preliminary interpretation, grounded in our empirical findings and existing literature, but explicitly constrained by our focus on a single, exceptionally ornate imperial interior and the particular generative models employed (Midjourney v6; Stable Diffusion XL). Therefore, the generalisability of these cross-cultural cognitive distortions to other heritage contexts or across different AIGC platforms (e.g., DALL·E) remains an open question requiring controlled comparative studies. For example, replication of the generation process under uniform camera views, simplified décor prompts, or using a Baroque interior as a cross-cultural control could test whether the “spectacle bias” persists irrespective of cultural context.

Pathway exploration: toward “critical generation”

The systematic geometric and semantic biases identified in the previous section necessitate a more reflexive and human-centered approach to AIGC in heritage reconstruction. Rather than treating AIGC as an autonomous generator, this study establishes a critical generation workflow that situates it within an expert-guided, evidence-based framework (Fig. 9).The framework transforms AIGC from an imaginative visual engine into a controlled instrument for historically grounded reconstruction, visually articulated through three sequential stages and their corresponding outputs.

Fig. 9: The proposed three-stage “critical generation” workflow.
Fig. 9: The proposed three-stage “critical generation” workflow.
Full size image

Proposed “critical generation” workflow integrating AIGC and expert-guided validation. a Stage 1: Concept Exploration—diverse prompt-driven schemes; b Stage 2: Historical Calibration—bias correction using SU model and archival evidence; c Stage 3: HBIM Integration and Semantic Enrichment—embedding cultural semantics and material metadata.

Stage 1: AIGC-driven concept exploration

This stage leverages AIGC’s generative capacity to explore diverse spatial and decorative alternatives. Guided by optimized prompts combining function + components + context + style, it produces a spectrum of conceptual schemes represented in Fig. 9a.The accompanying color palette (grey–ochre–wood tones) reflects the process of stylistic calibration—linking digital exploration with the chromatic logic of Qing-period interiors. At this stage, the workflow focuses on discovering compositional tendencies such as perspective exaggeration, decorative density, or atmosphere variation, forming a creative yet analyzable design base for later correction.

Stage 2: Historical calibration

At the methodological core, this phase anchors the generated results to historical evidence and geometric benchmarks. Using the SU reference model as ground truth and archival drawings as semantic anchors, experts selectively retain valid elements while adjusting distortions in proportion, structure, and symbolic content. Figure 9b illustrates this alignment between AIGC imagery and architectural reference—the digital stage (xitai) reconstructed to conform with historical dimensions and iconography. Through this interpretive calibration, the workflow shifts from speculative imagery to historically consistent and metrically coherent reconstruction.

Stage 3: HBIM integration and semantic enrichment (future work)

In the final envisioned phase, the calibrated model transitions into an HBIM environment, integrating geometric accuracy with cultural semantics. Each labeled element in Fig. 9c—eave ridge, structural beam, partition screen, wooden column, and platform base—can be enriched with metadata on material, craft, and symbolic meaning (e.g., bamboo-thread marquetry, tongjing hua).

This semantic embedding will enable the creation of richly annotated, interoperable digital assets for conservation, research, and immersive dissemination.

In conclusion, the empirical contribution of this study is twofold: it identifies the systemic biases intrinsic to AIGC outputs and validates a reproducible, expert-guided pathway for correcting them. The verified two-stage implementation demonstrates that AIGC can evolve from a generator of stylistic impressions into a critical co-creator—a tool that, when constrained by historical references and scholarly review, meaningfully contributes to evidence-based digital heritage reconstruction.

Discussion

This study evaluated the capability and limitations of AIGC in reconstructing Qing imperial interiors through a benchmarked analysis of Juanqinzhai. While AIGC excels at producing atmospheric diversity and visually coherent schemes, it also introduces systematic geometric distortions—including depth exaggeration, horizontal compression, vertical instability, and ornamental inflation—that compromise structural fidelity. Coupled with semantic inconsistencies and stylistic conflation, these tendencies suggest that current generative models privilege visual spectacle over architectural rationality, yielding culturally refracted rather than historically grounded reconstructions.

Building on these findings, the proposed Critical Generation workflow reframes AIGC from an autonomous creator into a critically supervised collaborator. By coupling creative exploration with SU-based geometric benchmarks and archival semantic anchors, the workflow establishes a reproducible path in which imaginative outputs are interpreted, aligned, and corrected through expert review. In this arrangement, AIGC becomes an instrument for hypothesis testing and rapid optioning, while calibration restores proportional discipline and cultural coherence.

Methodologically, the study advances evaluation beyond ornament-level resemblance to holistic spatial assessment, showing how algorithmic bias operates across scales—from geometry to cultural semantics. It also positions digital generation as critical co-production, where human expertise mediates algorithmic creativity to meet standards of evidence and accountability in heritage practice. This perspective can inform future frameworks for digital authenticity, participatory reconstruction, and the ethical use of AI in cultural visualization.

Limitations remain. Prompt design, even when systematically structured, involves interpretive choices; the experiments focused on Midjourney (v6) and Stable Diffusion XL; and Juanqinzhai is an exceptionally ornate case. Cross-cultural and cross-typology replications—including vernacular timber architecture and non-Chinese interiors—are needed to test transferability. Future research should pursue culturally weighted training sets, open reference benchmarks for comparative validation, and semi-automated calibration modules integrated with HBIM to support scalable, transparent, and ethically grounded AIGC-based reconstruction.