Abstract
Constructing a static exterior model of railway passenger station is a preliminary and crucial step in achieving a digital twin station. Station modeling often relies on manual techniques, requiring significant labor for stations spanning tens of thousands of square meters. While the interior decoration is symmetrical, the asymmetrical placement of equipment often leads to confusion in manual modeling. This paper proposes a novel station static model reconstruction method - MSCRAGS (Mobile vehicle-Sparse Sampling-Colmap-Resolution adjustment-Gaussian Splatting). The MSCRAGS integrates the requirements for flexible control of static exterior models by utilizing mobile vehicle for data collection and incorporates the sparse multi-view spatial sampling approach. It involves collecting multi-height and multi-angle appearance color data of passenger operation elements to construct preliminary point clouds. Subsequently, 3d Gaussian splatting is employed for rendering, achieving high-fidelity reconstruction of production elements. Moreover, according to the requirements for the precision control of static exterior models, the rendering effects are reshaped with resolution adjustment to obtain static exterior models of station production elements at various resolutions. Experiments conducted at Qinghe Station demonstrate that compared to other state-of-art modeling methods, our approach significantly reduces modeling time and improves modeling accuracy, showing superior performance in modeling high-fidelity indices.
Similar content being viewed by others
Introduction
Railway stations, in China, have proposed an Intelligent Railway Passenger Station (IRPS)1 blueprint, which explicitly states that station operation management should be implemented based on digital twin technology. A digital twin station replicates the physical space of the station in cyberspace. To achieve a digital twin station, the primary step involves constructing a static exterior model of the railway station and its internal entities. For dynamic targets, the process begins with establishing a static model of the target in a particular state. This static exterior model serves as the cornerstone of the station’s digital twin, transforming the physical entities of various production elements2 in the real-world station into three-dimensional models recognizable by computers, based on their visual appearance and static characteristics. These models vividly express the geometric dimensions, textures, and initial spatial positions of the production elements. This process requires the creation of a three-dimensional model in the cyberspace that is consistent with the architectural structure and equipment layout of the real world.
The primary technique for constructing three-dimensional models of passenger stations involves manual methods include manual3. However, for large-scale stations, such as superclass stations, the area of passenger terminal buildings is substantial, with some exceeding 100,000 square meters. The design of these stations often follows aesthetic principles, typically adopting symmetrical forms both laterally and longitudinally. However, key facilities and equipment such as gate indicator lights and static directional signs are not completely symmetrical. During the data organization process, the symmetry of the structural decorations and the asymmetry of equipment can easily lead to confusion about the positions of data from different sides, resulting in inaccurate modeling. The manual data collection process on-site requires a significant amount of labor and time, leading to an immense workload.
To address the issues, this paper, inspired by paper4, reconstructs and introduces a novel modeling method for static models of passenger station elements, termed the Mobile vehicle-Sparse Sampling-Colmap-Resolution adjustment-Gaussian Splatting (MSCRAGS) method. This method utilizes mobile vehicle for Sparse Sampling extraction frame stream and employs a multiview stereo pipeline, Gaussian splatting, and resolution adjustment. It automates the collection of passenger station production element data and generates high-fidelity static models of these elements. Simultaneously facilitated the synchronized collection and alignment of station exterior color and geometric dimensions, reducing the labor intensity of manual data collection and minimizing the workload for subsequent data alignment, thereby enhancing the efficiency of initial data collection for station static exterior modeling. The contributions of this paper are as follows:
-
It analyzes the production elements that need to be modeled at passenger stations and proposes a flexible control concept for static models.
-
It designs an autonomous mobile vehicle hardware that replaces manual camera-based data collection.
-
It designs sparse sampling with multi-angle data capture for the initial colors and appearances of production elements from multiple heights and angles.
-
It proposes a multi-resolution control and rendering method for static models, enabling the three-dimensional reconstruction of production elements at varying levels of detail.
The rest of the paper is structured as follows: Section “Related work” describes related work in railway reconstruction method. In Section “Flexible Control of Digital Twin Models for Station”, we introduce the components of station production elements and the requirements for its flexible control. The proposed MSCRAGS for reconstruction static exterior models is described in detail in Section “Methods”. The experimental procedures and results are described in Section “Experiment”. Finally, conclusions are drawn in summary in Section “Conclusion”.
Related work
Railway station reconstruction technology
Railway reconstruction provides convenience for information management, with Building Information Modeling (BIM) technology being widely adopted for new station construction, and photogrammetry being commonly used for railway station outdoor large-scale scenes.
BIM technology in railway station
Liu3 created an integrated three-dimensional informational management platform for entire railway bureaus combining foundational Geographic Information Systems (GIS) based geospatial data with BIM derived models of fixed railway operational equipment and facilities. Wang et al.5 developed a BIM-based Virtual Reality immersive training scenario for underground high-speed railway station evacuations during construction, which included processing BIM materials, scene decomposition and transmission, longitudinal and lateral expansion, normal direction unification, and scene baking. Lou et al.6 developed a visualization and control platform for the Nanning North Station building project, integrating BIM data, drone aerial photogrammetry, and topographical data. This platform aligns and coordinates these datasets based on their relative positions, thereby creating an integrated BIM platform. Hao et al.7 employed Revit architectural modeling software and the Unity3D engine to construct a large-scale railway route map for exploration, enabling the realization of a railway train driving training simulator. Yang et al.8 employed BIM and optimization graph convolution algorithms to construct intelligent transportation node models, novel enhancing transportation system performance and information processing efficiency. However, modeling techniques based on BIM and GIS predominantly employ manual modeling methods, which involve a substantial workload.
Reconstruction of real-world railway station outdoor
Zhu et al.9,10 proposed a methodology for the 3D reconstruction of real-world railway scenes that partitions the spatial scale into different resolution elements for 3D modeling across all life cycle stages, from planning and design to construction and maintenance, which facilitates the definition of building syntax, parameters, models, and textures. Liu et al.11 developed a real-world railway communication construction system equipped with functional modules for real-time display, maintenance management, emergency inquiry, and intelligent analysis. This system integrates visualization of real scenes, fusion of multi-source data collection, and specialized communication maintenance functions. Wang et al.12 utilized location search and drone attitude analysis to project and dynamically fuse original drone imagery, facilitating rapid image matching and forming a high-precision real-world 3D technology. Fan et al.4 and his team employed drone oblique photography to capture the current status of land use around high-speed railway stations and analyzed the land development potential surrounding the stations based on a real-world 3D platform. Su et al.13 conducted aerial orthophotography of the Su-Hu Intercity Railway using drones, resulting in the creation of a 190 \(\text{km}^2\) real-world 3D model. Moreover, the utilization of real-scene modeling techniques for large outdoor railway scenes, such as those involving unmanned aerial vehicle, offers new perspectives for indoor station modeling technology.
High-fidelity reconstruction
Beyond real-world scenes, three-dimensional scene representation techniques include point clouds, voxels, and triangular meshes. DeepPano14 utilizes deep neural networks to extract features from panoramic images and reconstruct them into 3D point cloud representations. Tan et al.15 proposed a novel video-based deep differentiation segmentation neural network for foreign object detection in real-world urban rail stations, effectively capturing the subtle shape features of the car door and platform seams. AtlasNet16 a neural network-based method, generates 3D meshes from point cloud data using an encoder-decoder architecture that incorporates both local and global feature maps to produce detailed, high-quality 3D meshes. 3DShapeNets17 transforms 3D voxels into a binary probability distribution using a convolutional deep belief network. Mildenhall introduced the concept of Neural Radiance Field (NeRF)18 , which employs implicit neural scene representations and volumetric rendering to achieve high-quality view synthesis. Subsequent developments such as NeRF++19 , Mip20 , Mip36021 , 3d gaussian splatting(3dgs)22 and tools23,24,25 have further advanced NeRF technology. The emergence of these methods has provided new technical means for digital twin passenger stations, enabling 1:1 mirroring of physical world stations.
Flexible control of digital twin models for station
The railway passenger stations in the physical world manages passenger transport operations and organizes the rapid and safe boarding and alighting of passengers2. The station’s management personnel need to oversee the entire process of serving passengers within the station, requiring not only broad overall control of various areas but also detailed monitoring of key operational areas. This has led to different levels of granularity in monitoring requirements for different objects at the station. Digital twin stations need to adapt to the requirements of station staff for flexible control. These objects are referred to as the production elements of the station, which encompass six categories. A passenger station comprises six production elements: personnel, trains, equipment, environment, station buildings, and business, each containing a diverse range of subcategories. Except for business, which do not have a tangible entity, all other production elements are represented by tangible entities.
Composition of the static exterior model
The six production elements can be categorized based on their intended use into daily operations and emergency management. Production elements specific to emergency management are not required for daily operations; however, in emergencies, these elements are utilized in addition to those needed for regular daily operations.
Daily operational elements provide passengers with standard transportation services, detailed as follows:
-
Personnel. This category includes passengers and their luggage, as well as staff members.
-
Equipment. This is categorized into ticketing equipment, security inspection devices, electromechanical equipment, passenger service devices, and facilities.
-
Ticketing equipment. This category encompasses manual real-name verification ticket machines, real-name verification gates, columnar ticket machines, gate-style automatic ticket machines, seat-number dispensers, receipt printers, ticket replenishment machine, integrated ticket sale/return machines, and police certification devices.
-
Security inspection devices. This category encompasses security gates and scanners.
-
Electromechanical equipment. This category includes escalators, elevators, air conditioning, heating radiator , ventilation, water supply systems, and sewage extraction devices.
-
Passenger service devices. This category comprises entrance screens, advertising screens, ticket checking screens, platform displays, departure and arrival announcement screens, intelligent inquiry machines, triangular screens, exit screens, check-in screens, remaining ticket screens, ticket window screens, cameras, lighting, intrusion detection devices at platform, local area broadcasting devices, loudspeakers, Bluetooth tags, wheelchair, and handheld terminals.
-
Facilities. This category encompass blind walk way signs, static signs, and directional signage.
-
-
Trains. This category includes originating trains, terminating trains, passing trains, and shuttle trains.
-
Environment. This refers to various sensors that monitor environmental conditions, including smoke detectors, air quality sensors, temperature sensors, humidity sensors, noise sensors, and brightness sensors.
-
Station buildings. These are the architectural components of the station, including entrance gates, waiting rooms, ticket checking areas, corridors, platforms, tracks, exit gates, subway connection areas, comprehensive service centers, nursing rooms, business and leisure areas, children’s play areas, cultural reading zones, and comprehensive service areas (military waiting areas, medical aid rooms), safety instruction signs, entrance guidance signs, floor schematic guidance signs, comprehensive service desks, restrooms, washrooms, and drinking fountains rooms.
Emergency management operational elements provide the necessary resources for staff during emergency responses, detailed as follows:
-
Personnel. This includes railway policemen, doctors, firefighters, and other external rescue forces.
-
Equipment. Comprises fire sprinkler systems, fire hoses, fire hydrants, rolling shutter doors, water pumps, portable fire extinguishers, box-type fire, fire alarms, disinfectant sprayers, barrier tapes, protective gear, and emergency megaphones.
-
Trains. Rescue trains.
-
Station buildings. Fire engine access.
Static exterior model flexible control
Flexible Control26 is a key concept in industrial manufacturing, referring to the ability to adapt control strategies based on changing production demands. This concept offers innovative approaches to managing passenger stations by addressing various production elements across different levels of management. Flexible control of passenger station static models involves simplifying (lightweighting) and refining (detailing) the three-dimensional models according to specific management needs. In passenger station management, areas with high business density require detailed control, whereas less busy areas do not. This flexible control approach is designed to meet varying application scenarios and performance requirements.
Model fidelity adjustment, involving both the lightweighting of high-precision models and the refinement of low-precision models, is achieved by altering the quantity of points, lines, and surfaces as well as the resolution of textures. This process creates a hierarchy of details with varying geometric face counts and texture resolutions, enabling variations in the model’s level of detail. In accordance with the surveying and mapping standards27,28 the flexible control of the static model for passenger stations is divided into three levels: lightweight, standard, and fine. Flexible Control on station buildings should be implemented from four aspects: Architectural Complexity (AC), Ground Complexity (GC), Elevation Accuracy (EA), and Texture Detail (TD). The specifics are presented in Table 1, the categories I, II, and III in the table follow the guidelines outlined in standards27,28.
Method
We propose the Mobile vehicle-Sparse Sampling-Colmap-Resolution adjustment -Gaussian Splatting (MSCRAGS) method for station reconstruction. Modeling based on mobile vehicle integrates Sparse sampling theory to collect appearance color data of passenger transportation production elements from multiple heights and angles. Subsequently, this foundational data is fed into a motion structure multi-view stereo pipeline to obtain sparse point cloud information of the geometric dimensions, color appearance, and other aspects of passenger transportation production elements. The foundational data is adjusted in resolution to accommodate the requirements of flexible control. Gaussian splatting is employed to render the sparse point cloud information, resulting in high-fidelity reconstruction of production elements. The overall structure is illustrated in Fig. 1.
The overall of mobile vehicle hardware design
The overall of hardware architecture of the mobile vehicle integrates various sensors and power systems, consisting of LiDAR, motion cameras, industrial computers, external mobile power supplies, and the vehicle base. The overall structure is illustrated in Fig. 2a. The final assembled mobile vehicle is illustrated in Fig. 2b and Fig. 2c.
The components and their functions are as follows:
-
Lidar. By emitting laser pulses and calculating the time required for the signal to return, a laser radar can precisely measure the geometric dimensions of production elements, obtaining high-precision 3D point cloud data for map construction.
-
Motion cameras. The motion cameras are responsible for capturing continuous sequences of images, collecting texture and color information of production elements for the virtual static model. Supporting high frame rate recording with image stabilization capabilities, the motion camera model employed in this paper is a commercially available action camera capable of recording video streams up to a resolution of 5312x2988, facilitating the capture of detailed information.
-
Mobile platform. The mobile platform is equipped with a omnidirectional intelligent wired-controlled chassis, featuring a four-wheel drive and steering system that allows for lateral movement and on-the-spot rotation, enhancing maneuverability and flexibility. It is equipped with the Robot Operating System29, which is particularly suited for performing stable rotations around the production elements of the virtual static model.
-
Power. For long-duration tasks or remote operations, a reliable power supply system is essential for ensuring the stable functioning of the vehicle, powering devices such as industrial computers and monitors through portable power sources.
Based mobile vehicle-sparse sampling for collection point cloud
Utilizing mobile vehicle hardware, initial data collection involves the continuous movement of vehicles combined with Sparse sampling approach to gather exterior color data of production elements from multiple heights and angles. Subsequently, this foundational data is input into a multi-view stereoscopic pipeline for motion structures, yielding sparse point cloud information on the geometric dimensions and color appearance of the passenger service elements.
Mobile vehicle collect raw data
Data collection for constructing a static exterior model of a tangible production element in passenger station begins by pinpointing the intended location of the model on a mobile vehicle map and marking it. Subsequently, routes circumventing this point are established, along with corresponding speeds for the circumnavigation, followed by initiating a data collection run with the mobile vehicle. The motion camera’s height is then adjusted for multiple collections. Upon completion, a series of appearance video streams containing production element data for the passenger station are obtained.
Sparse sampling extraction frame stream
Geometric and texture information of production elements is collected via mobile vehicles. When collecting geometric information, the entire length, width, and height dimensions of the production element must be fully covered. For appearance information, the color data of the production element must be obtained from all angles: front, back, left, right, top, and bottom. Thus, the collection of geometric dimensions and appearance color information for station production elements is divided into horizontal and vertical perspectives.
In the horizontal perspective, the action camera is kept at a fixed height, and the mobile vehicle circumnavigates the station production element along a predetermined route. For the vertical perspective, the action camera adjusts to a fixed height, and the mobile vehicle again circumnavigates to collect data. For geometric dimension collection, one complete circumnavigation at a horizontal dimension is sufficient. For vertical dimension data collection, to ensure full coverage, the action camera must adjust its height multiple times.
The number of vertical dimension collections (HC) required for station production elements is detailed in Eq. (1). Here, F is the field of view angle of the action camera, H is the height of the station production element, and d is the distance from the action camera to the production element. The number of vertical dimension collections HC is obtained by rounding up.
For texture information collection, the video stream captured by the action camera is used. If all image information from the video stream is utilized, the sequential frames will have significant overlapping areas, leading to information redundancy and computational waste during subsequent data processing. Therefore, image sequences from the video stream are extracted through frame sampling. In the task of reconstructing production factors, a single collection of the minimum spatial unit of production factors cannot fully restore accurate texture and geometric information. Therefore, at least two collections from different angles are required to achieve precise reconstruction. The constraints for horizontal and vertical calculations are given by Eq. (2).
The frame rate of the camera per minute is denoted as n, the speed of the moving vehicle as v, the distance traveled by the moving vehicle as s, the vertical height from the moving camera to the ground as h, the distance moved per frame as D, the ground width per frame as W, the overlap ratio between adjacent horizontal frames as \(c_s\), and the overlap ratio between vertical frames as \(c_h\). The corresponding number of video frames can be extracted according to different overlap ratios.
Colmap for structure-from-motion
Colmap processes the decimated video stream into an image sequence with overlapping regions, utilizing a multi-view stereo pipeline (Colmap)30,31 to construct an initial point cloud of the station’s production elements. Colmap is a pipeline equipped with Structure-from-Motion (SfM), designed for reconstructing ordered and unordered image collections. The data preprocessing via Colmap begins with feature extraction from the input images, followed by feature matching and geometric verification. This initial phase results in validated geometric image pairs, internal correspondence points, and geometric relationships. Subsequently, this information undergoes incremental reconstruction, including selection of initial image pairs, image registration, triangulation, and bundle adjustment, ultimately producing estimates of camera poses, registered image data, and scene point cloud information. Following this, LLFF32 integrates image information, camera poses, and parameters. The above procedures provide initial data for subsequent flexible control at different resolutions.
Resolution adjustment for flexible control
To address the need for precision control in static exterior models of production elements, we enhances general precision models through upsampling to achieve high-resolution models with increased detail. Conversely, downsampling is utilized to create lightweight models. The upsampling process is completed prior to using Colmap, whereas the downsampling process occurs afterward. Upsampling can enrich the original image with additional feature points, which is beneficial for the initial point cloud modeling in SfM; however, downsampling results in a significant loss of feature points, which adversely affects the initial point cloud modeling in SfM.
Resolution upsampling method
Resolution Upsampling based on ESRGAN33: Common approaches to increase image resolution from low to high include bilinear interpolation, bicubic interpolation, and super-resolution reconstruction. This paper employs Enhanced Super-Resolution Generative Adversarial Networks (ESRGAN)33 to upscale from a basic model to a refined model. ESRGAN excels in sharpness and edge detail while eliminating artifacts.
Resolution downsampling method
Resolution Downsampling based on Gaussian filtering: Initially, the input image is subjected to Gaussian filtering, which eliminates high-frequency components (detailed parts of the image) while preserving low-frequency components (smooth areas of the image). The Gaussian filter replaces the pixel value at a point with the weighted average of the pixels in its neighborhood, where the weights decrease monotonically with distance from the center point. The purpose of Gaussian filtering is to remove high-frequency noise, preparing the image for downsampling. After filtering, the image undergoes downsampling by discarding even rows and columns, thereby reducing each dimension (width and height) by half. This process is repeated using the cv2.pyrDown function from the OPENCV library to construct progressively smaller layers of the image, each half the size of the preceding one.
High-fidelity rendering
The aforementioned method has successfully generated sparse point clouds from image sequences with overlapping regions using SfM process. However, the sparsity of these point clouds is too great to directly represent the appearance and color of station production elements accurately, leading to significant discrepancies with reality. To address this, high-fidelity rendering techniques such as Neural Radiance Fields (NeRF) and its derivatives are commonly employed. In this study, we utilize the real-time neural rendering approach, 3D Gaussian Splatting, to reconstruct high-fidelity representations of station production elements.
Introduced by 3D Gaussian Splatting22 has had a profound impact due to its efficient generation speed and high-fidelity output. The process of 3D Gaussian Splatting begins with the initialization of the SfM point clouds into multiple 3D Gaussian spheres. Subsequently, using the camera’s extrinsics parameter, points are projected onto the image plane through the splatting algorithm. This is followed by differentiable rasterization to render the images. The results are then compared to real images to calculate loss values, which are used for backpropagation.
Experiment
Evaluation metrics
The objective of constructing a static model for passenger stations is to simultaneously reduce manual labor, shorten the modeling time, and meet the requirements for establishing high-fidelity models. To alleviate manual labor and decrease the duration of model construction, this study considers automation as the primary method, positing that less time consumed signifies greater efficiency.
-
Time consumption calculations. The time aspects are divided into several parts: duration of mobile vehicle data collection, duration of precision control generation, duration of initial point cloud construction through SfM, and duration of rendering. Different duration metrics are selected for various experimental conditions.
-
The duration of the initial point cloud construction through SfM comprises Extract Time (1), Feature Extraction Time (2), Feature Matching (3), and Initial Point Cloud Reconstruction Time (4).
-
The duration for rendering that is composed of Training Time (5) and Generating Time (6).
-
-
High fidelity indices. This paper utilizes three widely recognized benchmarks for achieving high fidelity in models: Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index Measure (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). Higher values of PSNR and SSIM indicate better quality, while a lower LPIPS score signifies improved perceptual likeness.
Experimental setup
The proposed model was developed using the PyTorch 11.8 framework on a Linux Mint 21.2 system, accelerated by four NVIDIA GeForce RTX 3080Ti GPUs, each with 12GB of memory. The physical site for the experiment is the Qinghe Station of the Beijing-Zhangjiakou High-Speed Railway. The modeled elements of passenger service production included Daily operational elements (safety instruction signs, entrance guidance signs, floor schematic guidance signs, real-name verification gates) and Emergency management elements (fire hydrants, portable fire extinguishers, fire alarms, box-type fire extinguishers).
Ablation studies
Sparse sampling analysis
The initial image sequences collected by mobile vehicle form the basis for constructing the initial point cloud in SfM and are fundamental to rendering. According to the sparse sampling approach, multiple video streams were subsampled, resulting in multiple frame sequences with horizontal and vertical dimensions sampled at more than twice the original rate. This approach was used to verify the effects of different subsampling results on time efficiency and fidelity. This paper focuses on a box-type fire extinguisher as an object of study in emergency management production elements, analyzing the sampling rate through multiple experiments. These experiments involved seven collection instances at varying heights, with an initial video stream totaling 32 seconds and comprising 1007 frames. After repeated trials, the results were recorded as averages. Due to the similarity in mobile vehicle data collection, this section focuses on the duration of the initial point cloud construction through SfM and the rendering duration in the time consumption calculations.
The duration for various stages of SfM are listed in Table 2, in minutes. The results indicate that as the sampling frame interval increases from 2 to 30 frames, the time required for frame extraction decreases, the number of frames obtained reduces, and the processing time for SfM decreases with an increasing sampling frame interval.
The duration for rendering based on different initial image sequence sampling counts, are shown in Table 3, also in minutes. The sampling frame interval ranged from 2 frames to 30 frames. As a comprehensive indicator for evaluating processing efficiency, the total processing time decreased from 24.533 minutes to 16.000 minutes, demonstrating a general trend of decreasing processing duration with increasing frame interval.
The high fidelity for different initial image sequence sampling counts is presented in Table 4.These findings suggest that reducing the sampling interval can increase the quantity of preliminary point clouds in SfM, potentially leading to higher-quality 3D reconstruction results. While smaller sampling intervals have an advantage in terms of preliminary point cloud quantity, there exists a turning point where, beyond this interval, PSNR and SSIM significantly decrease.
As shown in Fig. 3, the results depict a comparison between the ground truth (GT) and the final reconstruction outcomes at various sampling frame intervals. At a 2-frame sampling interval, the text information is clearly discernible, both in Chinese and English sections (“FIRE EXTINGUISHER BOX”), easily recognizable. However, as the sampling frame interval increases to 30, the rendered results indicate a significant loss of text information, posing a substantial challenge to text recognition. Details become less distinct, reflections of light are entirely absent, and there is noticeable presence of “dirty floating objects,” which adversely affect the final outcome, indicating lower fidelity in the frame-rate-based modeling.
Flexible control analysis
To assess the usability of flexible control, this experiment adjusted the resolution of safety instruction signage for daily operational production elements. The initial video stream had a total duration of 107 seconds, comprising 3208 frames. The initial sampling frame is designated as the baseline. The motion camera used for initial capture had a pixel resolution of 5312*2988, which is already considered fine-grained. For experimentation purposes, the initial images were downsampled by factors of 4, 8, and 16, followed by upsampling the results of the 8d operation to produce 8u for validation. As illustrated in Fig. 1, downsampling occurs after SfM, while upsampling occurs before SfM, thus maintaining consistency with the baseline values. After multiple trials, the experiment results were averaged, and processing durations for various stages of SfM were recorded, as shown in Table 5, with units in minutes. Processing durations for rendering at different resolutions are presented in Table 6, also in minutes. Table 7 provides a comparison of high-fidelity adjustments at different resolutions.
Since downsampling occurs after Structure-from-Motion (SfM), the duration of SfM and the initial number of point clouds remain consistent with the baseline for 4 times (4d), 8 times (8d), and 16 times (16d) downsampling, while rendering time decreases with increasing downsampling factors. SfM processing time after 8times upsampling is less than that of the baseline, with a greater initial point cloud number, and rendering time slightly below the baseline. In terms of fidelity, 8 times (8d) downsampling achieves the optimal values for PSNR, SSIM, LPIPS. However, at a 16 times (16d) downsampling, the image size is reduced to 16*7, a resolution at which the LPIPS value can no longer be calculated.
Comparisons
To more comprehensively evaluate the performance of the proposed model, it was compared with state-of-the-art rendering methods under identical environmental configurations. Specifically, the model was benchmarked against NeRF18, NeRF++19 , Mip20 , Mip36021 , and 3dgs22. The comparison metrics included duration of rendering, PSNR, SSIM, and LPIPS. Experiments were conducted on production elements of passenger stations in Daily operation elements (including safety instruction signs [dth], entrance guidance signs [ftq], floor schematic guide signs [jzk], and real-name verification gates [zj]), and Emergency management operational elements (fire hydrants [xfs], box-type fire extinguishers [mhq], fire alarms [hz], and portable fire extinguishers [stm]).
Daily operation elements
The model focused on the daily operational production factors of passenger stations, which include safety instruction signs (dth), entrance guide signs (ftq), floor schematic guide signs (jzk), and real-name verification self-service gates (zj). Specific results are presented in Table 8, with the best results in each category highlighted in bold. Additionally, a qualitative analysis of the static daily operation model was conducted, with rendering results from different methods shown in Fig. 4.
The bold entries in Table 8 represent the best results. For daily operational production factor modeling, the time required using the method described in this paper is significantly reduced compared to NeRF18, Mip20, Mip36021, and NeRF++19. Although the total duration is relatively increased compared to 3dgs22. The LPIPS of dth decreased by 72.00% compared to the NeRF method, the PSNR of jzk increased by 35.29% compared to Mip, and the SSIM of ftq improved by 55.31% compared to NeRF. Although the PSNR of ftq is slightly lower than that of Mip360, its LPIPS decreased by 51.05%, and SSIM increased by 26.62% compared to Mip. In terms of overall performance, our method demonstrate statistically significant improvements in both total processing time and high-fidelity metrics compared to other approaches. This indicates that the methods presented in this chapter have an advantage in overall static model reconstruction quality.
Emergency management operational elements
Similarly, this paper compares the modeling of emergency management production elements, including fire hydrants (xfs), box-type fire extinguishers (mhq), fire alarms (hz), and portable fire extinguishers (stm). The proposed model was benchmarked against NeRF18, NeRF++19 , Mip20 , Mip36021 , and 3dgs22 using metrics including total training time (Total Time), PSNR, SSIM, and LPIPS. The specific results are shown in Table 9.
The bolded entries in Table 9 represent the best results. For emergency management operational elements, the processing time of our method shows a significant reduction compared to NeRF18, Mip20 , Mip36021, and NeRF++19 , and a relative improvement over the total duration compared to 3dgs22. Specifically, xfs exhibits a 52.49% decrease in LPIPS and increases of 42.38% in PSNR and 16.16% in SSIM relative to the Mip360 method. Similarly, mhq demonstrates a 37.85% reduction in LPIPS and improvements of 29.08% in PSNR and 4.85% in SSIM compared to NeRF++. In the stm, our method achieved an LPIPS value of 0.3130356, which is 20.66% lower than the Mip method; although our PSNR value is lower than that of the Mip360 and NeRF++ methods, the image similarity is higher than both, demonstrating the excellent performance of our method in maintaining image structural fidelity.
As illustrated in Fig. 5, the distant scenes in the hz results are not as effective as those in Mip360, which accounts for the lower metrics. However, the intrinsic modeling quality of the production factor is visually indistinguishable from that of Mip360, which does not affect the reconstruction of this production factor. Other results in the figure indicate that our method significantly outperforms others in terms of perceptual quality, execution efficiency, visual similarity, and structural fidelity.
Conclusion
To address the challenges of labor-intensive and time-consuming digital twin modeling, this paper proposes a novel static modeling method for stations based on moving vehicles-MSCRAGS (Mobile vehicle-Sparse Sampling-Colmap-Resolution adjustment-Gaussian Splatting). This method meets the need for flexible control of static models by using sparse sampling to collect multi-level and multi-angle appearance color data of passenger service elements. These data are input into a multi-view stereo motion structure pipeline to generate sparse point cloud information on the geometric dimensions and color appearance of passenger service elements. Subsequently, 3d Gaussian splatting technology is used to render the sparse point cloud information, resulting in high-fidelity reconstructions of service elements. To meet fidelity requirements, the rendered results undergo super-resolution reshaping to create static models of station service elements at different resolutions. Experiments conducted at Qinghe Station demonstrate that, compared to other state-of-the-art methods, this approach significantly reduces modeling time and improves accuracy, as verified by comparisons of modeling time and high fidelity metrics. The proposed method targets indoor modeling of railway passenger stations and is not applicable to modeling extensive railway lines spanning thousands of kilometers or dynamic complex scenes with large moving crowds. Based on these findings and limitations, future research will focus on exploring new technologies to facilitate static modeling of complex railway scenes with low computational resource requirements.
Data availability
The data that support the findings of this study are available on request from the corresponding author, upon reasonable request.
References
Shi, T. & Peng, K. Overall architecture and key technologies of enhanced intelligent railway station. Railw. Transp. Econ. 43, 72–79. https://doi.org/10.16668/j.cnki.issn.1003-1421.2021.04.12 (2021).
Shi, T. & Zhang, C. Overall design and evaluation of intelligent railway passenger station system. Railw. Comput. Appl. 27, 9–16 (2018).
Liu, C. Research on railway 3d information management platform based on gis+bim. Geomat. Spatial Inf. Technol. 45, 19–22 (2022) (1672-5867(2022)12-0019-04).
Fan, X. & Zhan, G. The application of UAV oblique photography technology in land comprehensive development of the high-speed railway station yard. Eng. Technol. Res. 8, 22–24. https://doi.org/10.19537/j.cnki.2096-2789.2023.11.008 (2023).
Wang, Z., Ma, W., Yu, J. & Guan, G. Key technologies for evacuation drill scene construction of immersive underground high speed railway station based on bim+vr. Railw. Eng. 62, 153–157 (2022) (10.03-1995-2022-12-0153-05).
Lou, H. et al. Design and application of visual management and control platform for railway station buildings based on bim+gis. Railw. Tech. Innov. 04, 63–68. https://doi.org/10.19550/j.issn.1672-061x.2023.06.06.001 (2023).
Hao, Z. & Zhang, W. Development of railway training virtual simulation system based on unity3d. Comput. Simul. 37, 99–103 (2020) (1672-5867(2022)12-0019-04).
Yang, Y., Zhang, J., Chen, W. & Zhai, C. Key technologies and applications of intelligent design for complex traffic node driven by mathematics-modeling. Int. J. Mod. Phys. C 2450240 https://doi.org/10.1142/S0129183124502401 (2024).
Zhu, Q., Zhu, J., Huang, H., Wang, W. & Zhang, L. Real 3d spatial information platform and digital twin Sichuan-Tibet railway. High Speed Railw. Technol. 45, 19–22. https://doi.org/10.12098/j.issn.1674-8247.2020.02.0084 (2022).
Zhu, Q. et al. Classification and coding of entity features for digital twin Sichuan-Tibet railway. Geomat. Inf. Sci. Wuhan Univ. 45, 4–12. https://doi.org/10.13203/j.whugis20200010 (2020).
Liu, C. Research on live maintenance system for railway communication based on digital twin. Autom. Instrum. 10, 84–88. https://doi.org/10.13238/j.issn.1004-2954.202306050003 (2021).
Wang, K. Application of real 3d image fusion in completion acceptance of high speed railway. J. Railw. Eng. Soc. 39, 104–109 (2022) (10.06-2106(2022)12-0104-06).
Su, Z., Ming, J. & Xuan, C. Research and application of railway route optimization technology based on 3d large-scale scene. Railw. Stand. Des. 67, 15–19. https://doi.org/10.13238/j.issn.1004-2954.202306050003 (2023).
Shi, B., Bai, S., Zhou, Z. & Bai, X. Deeppano: Deep panoramic representation for 3-d shape recognition. IEEE Signal Process. Lett. 22, 2339–2343. https://doi.org/10.1109/LSP.2015.2480802 (2015).
Tan, F., Zhai, M. & Zhai, C. Foreign object detection in urban rail transit based on deep differentiation segmentation neural network. Heliyon 10, e37072. https://doi.org/10.1016/j.heliyon.2024.e37072 (2024).
Groueix, T., Fisher, M., Kim, V. G., Russell, B. C. & Aubry, M. Atlasnet: A papier-mâché approach to learning 3d surface generation. CoRR arXiv: abs/1802.05384 (2018).
Wu, Z. et al. 3d shapenets: A deep representation for volumetric shapes. In 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 1912–1920, https://doi.org/10.1109/CVPR.2015.7298801 (2015).
Mildenhall, B. et al. Nerf: Representing scenes as neural radiance fields for view synthesis. In Computer Vision – ECCV 2020, (eds Vedaldi, A. et al.) 405–421 https://doi.org/10.1007/978-3-030-58452-8_24 (Springer International Publishing, 2020).
Zhang, K., Riegler, G., Snavely, N. & Koltun, V. Nerf++: Analyzing and improving neural radiance fields. CoRRarXiv: abs/2010.07492 (2020).
Barron, J. T. et al. Mip-nerf: A multiscale representation for anti-aliasing neural radiance fields. CoRR arXiv: abs/2103.13415 (2021).
Barron, J. T., Mildenhall, B., Verbin, D., Srinivasan, P. P. & Hedman, P. Mip-nerf 360: Unbounded anti-aliased neural radiance fields. CoRR arXiv: abs/2111.12077 (2021).
Kerbl, B., Kopanas, G., Leimkühler, T. & Drettakis, G. 3d gaussian splatting for real-time radiance field rendering. ACM Trans Graph. 42 ( 2023).
Tancik, M. et al. Block-nerf: Scalable large scene neural view synthesis. In 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8238–8248, https://doi.org/10.1109/CVPR52688.2022.00807 (2022).
Niemeyer, M. & Geiger, A. Giraffe: Representing scenes as compositional generative neural feature fields. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 11448–11459, https://doi.org/10.1109/CVPR46437.2021.01129 ( 2021).
Tancik, M. et al. Nerfstudio: A modular framework for neural radiance field development. In Special Interest Group on Computer Graphics and Interactive Techniques Conference Conference Proceedings https://doi.org/10.1145/3588432.3591516 (ACM) (2023).
Yang, Y. et al. A novel digital twin-assisted prediction approach for optimum rescheduling in high-efficient flexible production workshops. Comput. Ind. Eng. 182, 109398. https://doi.org/10.1016/j.cie.2023.109398 (2023).
China Xiong’an group. Q/xag /1001-2021 the first volume of the bim technical standards for construction projects by China Xiong’an group. http://gxzxht.com/public/uploads/20220217/afb15484b127a0b527a5f405c1f92403.pdf (2021).
Ministry of Natural Resources, People’s Republic of China. Ch/t 9032-2022 specifications for global geographic information products. https://hbba.sacinfo.org.cn/attachment/onlineRead/7712cb65ef8c55c3a84d52e52b2320c7060a47a367a7b4fb379d787ae4ebc0c0 (2022).
Shan, T. et al. Lio-sam: Tightly-coupled lidar inertial odometry via smoothing and mapping. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 5135–5142 (IEEE, 2020).
Schonberger, J. L. & Frahm, J.-M. Structure-from-motion revisited. In Conference on Computer Vision and Pattern Recognition (CVPR) ( 2016).
Schönberger, J. L., Zheng, E., Pollefeys, M. & Frahm, J.-M. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision (ECCV) ( 2016).
Mildenhall, B. et al. Local light field fusion: Practical view synthesis with prescriptive sampling guidelines. CoRR arXiv: abs/1905.00889 (2019).
Wang, Z., Bovik, A., Sheikh, H. & Simoncelli, E. Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13, 600–612. https://doi.org/10.1109/TIP.2003.819861 (2004).
Acknowledgements
The authors would like to express their gratitude and appreciation to China-China State Railway Group Co., Ltd. and China Academy of Railway Sciences Corporation Limited for all the support and facilities throughout the research. This project was also partially funded by the National Natural Science Foundation of China-China State Railway Group Co., Ltd. Railway Basic Research Joint Fund (Grant No.U2268217) and the Scientific Funding for China Academy of Railway Sciences Corporation Limited (No 2023YJ125).
Author information
Authors and Affiliations
Contributions
Conceptualisation, X.W.; methodology, X.W., X.L., and T.S.; software, L.B., W.B. and K.P.; validation, X.W. and X.L.; formal analysis, X.W. and L.B.; investigation, W.B.; writing-original draft preparation, X.W. and K.P. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, X., Lv, X., Shi, T. et al. The reconstruction method for static exterior model of digital twin railway station based on mobile vehicle. Sci Rep 15, 14222 (2025). https://doi.org/10.1038/s41598-025-96535-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-96535-0