Design and research of bridge collision avoidance system based on camera calibration technology and motion detection

Wang, Xiaolei; Wang, Shichao; Wei, Zhihao; Ren, Ruijiang; Huang, Jingshan

doi:10.1038/s41598-025-19096-2

Download PDF

Article
Open access
Published: 08 October 2025

Design and research of bridge collision avoidance system based on camera calibration technology and motion detection

Xiaolei Wang¹,
Shichao Wang¹,
Zhihao Wei¹,
Ruijiang Ren¹ &
…
Jingshan Huang¹

Scientific Reports volume 15, Article number: 35146 (2025) Cite this article

258 Accesses
Metrics details

Subjects

Abstract

Bridge collisions, particularly those involving over-height vehicles, pose significant threats to public infrastructure, economic stability, and human safety. This study presents an intelligent, vision-based Bridge Collision Avoidance System (BCAS) that leverages advanced camera calibration techniques, motion detection algorithms, and real-time risk assessment frameworks to proactively detect and mitigate potential collisions. The system architecture integrates high-resolution video feeds with precise intrinsic and extrinsic camera calibration to accurately transform 2D motion into real-world coordinates. Motion detection and object segmentation are performed using a hybrid approach combining traditional background subtraction with deep learning-based models such as YOLOv11 and Vision Transformers (ViT), ensuring robustness in dynamic lighting and occlusion-prone environments. Object trajectory estimation is achieved through frame-wise velocity computation and spatial projection, enabling predictive collision path analysis. A risk evaluation model classifies threat levels using spatial thresholds, velocity vectors, and entropy-calibrated confidence scores. Real-time alerts are dispatched through low-latency edge-cloud frameworks with visual and auditory feedback to connected operators. Experimental validation across diverse scenarios—including occlusion, night conditions, and dense traffic—demonstrates superior performance in terms of accuracy (95.7%), false alarm rate (3.2%), and average system response latency (162 ms), when benchmarked against traditional rule-based and motion detection systems. This research contributes a modular, scalable, and fault-tolerant solution suitable for real-world deployment to enhance bridge safety in smart urban infrastructures.

Enhanced YOLO v3 for precise detection of apparent damage on bridges amidst complex backgrounds

Article Open access 15 April 2024

Cable vibration and girder displacement measurement via morphology recognition and feature matching methods in bridge structure

Article Open access 01 April 2025

Drone-based displacement measurement of infrastructures utilizing phase information

Article Open access 09 January 2024

Introduction

Bridges serve as critical components of transportation infrastructure, enabling the smooth flow of vehicles, goods, and people across geographic barriers such as rivers, valleys, and urban obstacles¹. As traffic volumes continue to grow, particularly in urban and industrial areas, the risks associated with bridge collisions have become increasingly pronounced². These collisions, often caused by over-height vehicles or waterborne vessels failing to clear the bridge deck, can result in severe structural damage, loss of human life, and prolonged disruption of essential transportation routes³. Despite the deployment of conventional preventive measures such as static signage, overhead barriers, and manual surveillance, the current methods remain insufficient in addressing the dynamic nature of modern traffic environments⁴. A more intelligent and proactive solution is urgently required to detect and prevent potential collisions before they occur⁵.

The integration of advanced computer vision techniques, particularly camera calibration and motion detection, presents a promising opportunity for enhancing bridge safety monitoring systems⁶. Camera-based surveillance offers a non-intrusive and cost-effective solution; however, its effectiveness is significantly compromised when the captured images lack geometric accuracy due to improper calibration⁷. Without precise calibration, the transformation of pixel-based coordinates to real-world measurements becomes unreliable, thereby impeding accurate estimation of an approaching object’s size, speed, and trajectory⁸. Furthermore, motion detection algorithms used in current systems often suffer from high false positive rates, especially under complex environmental conditions such as rain, fog, or fluctuating lighting⁹. These challenges underscore the need for a robust, real-time bridge collision avoidance system that is capable of not only detecting moving objects but also accurately interpreting their trajectories in a spatially meaningful context¹⁰.

The motivation for this study arises from the recurring incidents of bridge collisions reported across the globe¹¹. Many of these incidents could have been avoided through timely intervention based on reliable real-time monitoring and predictive analysis¹². The increasing complexity of traffic systems, coupled with the limitations of traditional monitoring solutions, necessitates the development of an integrated, intelligent system that can identify threats early and initiate preventive measures¹³. Technological advancements in camera calibration methods—such as intrinsic and extrinsic parameter estimation—and the growing sophistication of motion analysis techniques provide a solid foundation for such innovation¹⁴.

However, several unresolved challenges persist. Current systems often lack spatial precision due to uncalibrated imaging devices, leading to errors in object localization and size estimation. Additionally, motion detection in uncontrolled outdoor environments remains a non-trivial problem due to noise, background variations, and occlusions¹⁵. Existing frameworks are also typically reactive rather than predictive, issuing alerts only when an object is dangerously close to a bridge structure, thus limiting the response time for any corrective action¹⁶. Moreover, the lack of seamless integration between calibration, motion detection, object tracking, and risk assessment modules further compromises the reliability of these systems in real-world deployment¹⁷.

To address these gaps, the present research proposes the design and development of an intelligent bridge collision avoidance system based on camera calibration technology and motion detection. The primary objective is to enhance the accuracy and reliability of bridge safety monitoring by leveraging calibrated camera systems that can map visual data into real-world dimensions¹⁸. This enables the precise detection and tracking of objects in a defined monitoring zone. The system also incorporates advanced motion detection algorithms capable of operating under varying environmental conditions, supported by real-time object tracking and collision risk assessment components¹⁹. The integration of these modules within a cohesive framework allows the system to predict potential collisions by analyzing object speed, trajectory, and distance from the bridge structure²⁰. An automated alert generation mechanism is included to issue timely warnings to relevant authorities or vehicle operators, thereby enabling prompt preventive actions²¹.

The key objectives of this research are fourfold: first, to implement an accurate camera calibration model to enhance spatial understanding of captured scenes; second, to develop a motion detection and object tracking algorithm suitable for dynamic environments; third, to integrate a collision risk assessment module that forecasts potential impact scenarios; and finally, to deploy a real-time alert system for rapid response. Through a combination of theoretical modeling, algorithm design, and system-level integration, this research aims to contribute a novel and practical solution to the ongoing challenge of bridge collision prevention. The proposed system is expected to outperform existing methods in terms of spatial accuracy, detection reliability, and response efficiency, thereby offering a viable approach for modern smart infrastructure applications.

Despite the deployment of conventional surveillance and rule-based monitoring systems, current approaches remain limited by high false alarm rates, poor adaptability to changing environmental conditions, and insufficient predictive capability. These gaps underscore the necessity of developing an intelligent, vision-based bridge collision avoidance system capable of proactive risk prediction and low-latency response. The innovation of this study lies in its integration of precise camera calibration with deep learning-based motion detection (YOLOv11, ViT), trajectory forecasting through calibrated spatial mapping, and entropy-calibrated risk classification within an IoT-enabled alert framework. This holistic design not only reduces false alarms and response delays but also provides a scalable and fault-tolerant solution for deployment in modern smart infrastructure. In light of the identified challenges and research objectives, the following represent the major contributions of this study:

This study presents a robust framework that accurately Maps 2D image coordinates to real-world spatial dimensions using intrinsic and extrinsic camera calibration parameters, thereby enabling precise object localization and trajectory estimation near bridge structures.
The proposed system incorporates optimized motion detection techniques, including background modeling and real-time object tracking, to reliably identify moving threats under diverse environmental conditions such as rain, low light, and occlusions.
A novel predictive model is introduced to assess the likelihood of collision by analyzing the motion dynamics—such as speed, direction, and proximity—of approaching vehicles or vessels, thus enabling proactive safety measures.
The system includes a real-time alert generation component capable of issuing early warnings to relevant authorities or vehicle operators, thereby improving response time and reducing the probability of structural damage or human casualties.
The effectiveness and robustness of the proposed collision avoidance system are validated through controlled simulations and practical case studies, demonstrating significant improvement over conventional, uncalibrated surveillance systems in terms of accuracy, reliability, and operational efficiency.

This research article is structured to provide a comprehensive analysis and solution to the problem of bridge collisions using camera calibration and motion detection technologies. It begins with an introduction outlining the background, motivation, problem statement, objectives, and key contributions. A detailed literature review follows, highlighting existing systems and identifying gaps in current methodologies. The subsequent sections present the system design, including the camera calibration model, motion detection algorithm, and collision risk assessment framework. Experimental validation and performance evaluation are then discussed, followed by a presentation of results and in-depth analysis. The article concludes by summarizing the findings and outlining directions for future work to enhance system scalability and intelligence.

Literature review

The increasing interest in bridge safety and collision prevention has led to several advancements in surveillance systems, motion detection techniques, and camera-based monitoring in recent years²². Zhang et al. proposed a vision-based monitoring system utilizing monocular camera calibration and object tracking to detect over-height vehicles approaching bridge underpasses²³. The system employed geometric transformation to estimate object height from image coordinates; however, its performance was Limited by poor Lighting conditions and frequent false positives. The dataset used was custom-collected under controlled scenarios, and the system achieved an accuracy of 88% in height estimation, but lacked generalization to outdoor real-world scenes.

In²⁴, Zaarane et al. (2020) introduced a stereo vision system for real-time vehicle dimension measurement at toll gates using dual cameras and disparity Mapping. Their methodology effectively calculated 3D coordinates, enabling precise distance estimation. The system, tested on a dataset of 600 annotated vehicle images, reported a mean absolute error of less than 5 cm. However, its limitations included sensitivity to camera misalignment and calibration drift over time, making it less suitable for long-term unattended deployments.

Hosain et al. (2024)¹⁰ developed a deep learning-based detection system using YOLOv3 for identifying incoming large vehicles on bridge approaches. Their methodology combined object detection with GPS tagging for geofencing near critical zones. They used the KITTI dataset al.ong with additional overhead footage, achieving over 92% detection accuracy. Nonetheless, the system showed reduced performance in foggy or rainy weather, indicating the need for sensor fusion.

In²⁵, the authors proposed a hybrid LIDAR-camera system for Maritime bridge collision detection, integrating sensor data through Kalman filtering. While the multi-modal system achieved robust detection of ships approaching bridge piers, the primary Limitation was the high cost and complexity of hardware setup. Tests conducted on real-time port surveillance data demonstrated reliable detection within 50 m but suffered latency in object classification.

A study by Halfawy et al. (2014)²⁶ utilized optical flow techniques and background subtraction to detect motion near bridge structures using CCTV footage. The algorithm was evaluated on publicly available traffic monitoring datasets and demonstrated satisfactory tracking of vehicles, but was prone to false positives from shadows and environmental noise. The authors acknowledged that background modeling required frequent recalibration, limiting deployment in dynamic environments.

In²⁷, Aly et al. (2022) applied the MeanShift tracking algorithm combined with a calibrated monocular camera to monitor the movement of over-height vehicles. The camera calibration was conducted using the chessboard method, and the test dataset comprised 500 vehicle entries at a controlled highway site. The system achieved 85% tracking consistency, but failed to handle occlusion and side-view angle distortions, impacting real-world reliability.

Seisa et al. (2024)²⁸ introduced a real-time edge computing solution with embedded cameras and motion sensors for bridge collision warning. The system processed motion data locally using Raspberry Pi-based units, reducing latency and network dependency. Field deployment on a rural bridge showed promising results with 94% successful detection of unauthorized entries. The primary limitation was computational constraints in handling simultaneous multi-object tracking.

In²⁹, Zhang et al. (2022) utilized a deep convolutional neural network (DCNN) to classify vessel types and detect movement patterns for bridge collision prevention in inland waterways. The model was trained on a dataset of 2000 labeled vessel images and incorporated AIS data for speed estimation. While the model achieved 91.3% classification accuracy, it lacked real-time performance due to the need for cloud-based computation, posing challenges for latency-critical applications.

An IoT-based monitoring framework was presented in³⁰, combining calibrated surveillance cameras and ultrasonic sensors for real-time bridge underpass protection. The system integrated sensor readings and image coordinates through a local edge gateway, alerting approaching vehicles through dynamic signage. Although the system performed well with an average detection time of 2.3 s, its reliability decreased significantly in high-traffic scenarios due to sensor saturation and visual occlusion.

The authors explored the use of Structure from Motion (SfM) and multi-view geometry for generating 3D maps of bridge surroundings to monitor approaching threats³¹. The method utilized drone footage and OpenMVG for reconstruction. Although it provided accurate 3D models, with an average deviation of 2%, the system was computationally intensive and unsuitable for continuous real-time monitoring.

In³², Dong et al. (2024) introduced a transformer-based vision system for large object trajectory prediction near bridge structures. Their approach utilized a spatiotemporal attention mechanism over a dataset of 8,000 time-sequenced images of highway vehicles. The model achieved a prediction accuracy of 94.5% in determining collision trajectories within a 4-second future window. However, the model’s inference time was relatively high, making it less suitable for low-latency applications without GPU support.

In³³, Djenouri et al. (2024) developed a federated learning framework that enabled multiple roadside cameras to collaboratively train a vehicle detection model without sharing raw video data, thereby enhancing data privacy. The model was built on the MobileNetV2 backbone and trained using local datasets from multiple smart city intersections. Results showed that detection accuracy reached 91% while preserving data locality. Limitations included synchronization issues and occasional model drift due to non-IID (non-identically distributed) data.

Thombre et al. (2020)³⁴ proposed a multi-sensor fusion framework using radar, depth cameras, and calibrated RGB cameras for vessel-bridge collision avoidance. They used Bayesian filtering and Dempster-Shafer theory to fuse sensor confidence levels. The dataset consisted of annotated Maritime surveillance videos and radar logs from Busan Port. The system achieved 97% precision in threat detection, but required high-bandwidth data transmission and consistent sensor calibration.

In³⁵, a deep reinforcement learning (DRL) approach was proposed by Fahimullah et al. (2024) for proactive decision-making in bridge traffic control. Using a simulation environment built on SUMO and OpenCV-based video analytics, the system learned to activate warnings or reroute traffic based on estimated collision risk. The model achieved a cumulative reward score 38% higher than rule-based baselines but suffered from slow convergence and required extensive training episodes.

In³⁶, Yang et al. (2014) introduced a stereo-camera-based 3D bounding box estimation method for vehicle collision monitoring, enhanced with a Kalman filter for object trajectory smoothing. Tested on a custom dataset of 1,200 annotated stereo pairs near bridge entrances, the system maintained a root-mean-square error (RMSE) below 0.3 m in spatial tracking. However, its performance degraded at night without supplemental infrared imaging.

In³⁷, Fu et al. (2022) applied a real-time instance segmentation model (Mask R-CNN) integrated with camera calibration for object dimension estimation near critical bridge zones. Using the Cityscapes and BDD100K datasets fine-tuned for structural environments, the model achieved a mean average precision (mAP) of 89.4%. Limitations were noted in segmenting overlapping vehicles during peak traffic conditions.

An innovative edge-AI solution was developed by Azfar et al. (2024) in³⁸, where an NVIDIA Jetson Nano-powered module performed onboard detection and risk scoring using YOLOv5 and optical flow tracking. The system was deployed on a smart highway bridge prototype, detecting vehicle intrusion and speed in real-time with 96% accuracy. The main constraint was hardware heat dissipation during prolonged operation in harsh outdoor environments.

In³⁹, a vision transformer model (ViT-B/16) was used by Conde et al. (2021) for fine-grained classification of abnormal object behaviors around bridge zones. The model was pretrained on ImageNet and fine-tuned on a surveillance video dataset with labeled anomalies. It achieved an F1-score of 92.6% and effectively classified behaviors such as illegal U-turns, reverse driving, and potential over-height entries. However, the model was compute-heavy and required TPU support for optimal inference speed.

In⁴⁰, Arroyo et al. (2024) developed a real-time collision prevention system using LiDAR point cloud alignment with RGB video feeds for validating object presence and height near bridge thresholds. Their system achieved a high-resolution 3D Mapping with a point registration error below 1.5 cm. The dataset included Velodyne HDL-64E scans and synchronized camera feeds from urban highways. While highly accurate, the system’s cost and complexity made it suitable only for high-risk zones.

Lastly, in⁴¹, a graph neural network (GNN)-based spatiotemporal reasoning framework was proposed by Li et al. (2023) to model interactions between vehicles and static bridge elements. Nodes represented objects and their features, while edges modeled spatial and temporal dependencies. The model, trained on the nuScenes dataset, achieved superior generalization across weather and traffic conditions, reaching 93% prediction accuracy. However, interpretability of the learned graph relations remained a challenge.

The reviewed literature highlights the rapid advancements in bridge collision avoidance systems, particularly through the integration of computer vision, sensor fusion, and intelligent monitoring techniques. Traditional systems relying solely on uncalibrated camera setups or single-modality sensors have demonstrated limited reliability in real-world deployments due to spatial inaccuracies, high false positive rates, and sensitivity to environmental conditions. Recent research has explored the incorporation of calibrated camera models, stereo vision, motion detection algorithms, and deep learning architectures such as YOLO, Mask R-CNN, and Vision Transformers to improve object detection and trajectory prediction. Notably, several studies have embraced edge computing, federated learning, and graph-based reasoning to enhance system efficiency, privacy, and contextual understanding. Despite these advances, limitations persist in terms of real-time processing capabilities, scalability, hardware constraints, and adaptability to diverse environmental settings. The existing body of work thus underscores the need for an integrated, robust, and real-time bridge collision avoidance system that leverages camera calibration and motion detection while addressing the challenges of dynamic traffic environments and structural diversity. This study aims to build upon these foundations and contribute a unified framework capable of accurate threat detection, risk assessment, and proactive alert generation.

Methodology

To address the challenges associated with accurate and timely bridge collision avoidance, this study proposes a comprehensive, multi-stage methodology that integrates precise camera calibration, intelligent motion detection, object trajectory estimation, and real-time alert generation within a unified system architecture. The methodology is designed to operate in complex, dynamic environments, ensuring robustness against varying lighting conditions, object speeds, and structural layouts. Each component of the system is methodically developed and validated to enhance spatial accuracy, detection reliability, and response efficiency. The following subsections detail the individual modules of the proposed framework, including system design, calibration processes, motion tracking algorithms, risk assessment strategies, and implementation specifics. Together, these components contribute to a reliable, real-time collision avoidance solution suitable for deployment in intelligent transportation and smart infrastructure environments.

System overview and operational workflow

The proposed bridge collision avoidance system is designed as a modular, real-time framework that integrates calibrated camera feeds, intelligent motion analysis, and collision prediction models to proactively identify and mitigate potential collision threats. The architecture of the system is structured into three primary stages: input acquisition, processing pipeline, and alert generation. Each stage contains dedicated modules that ensure data integrity, analytical robustness, and real-time responsiveness suitable for deployment in intelligent transportation and smart infrastructure environments. The input stage is responsible for acquiring video data from strategically mounted surveillance cameras positioned near or on bridge structures. These cameras are subjected to an initial calibration phase to correct lens distortion and establish a reliable mapping between image space and physical world coordinates. Additional input sources may include metadata from embedded sensors (e.g., motion sensors or GPS modules) to augment visual information, especially under low-visibility conditions.

The processing stage comprises multiple sequential modules: (i) real-time motion detection to isolate moving objects, (ii) object segmentation and tracking to identify persistent collision candidates, and (iii) trajectory estimation to compute direction, speed, and predicted impact zones. This phase relies heavily on calibrated geometry to infer real-world positions and motion vectors. In the output stage, a risk assessment module evaluates the likelihood of collision based on the object’s estimated trajectory and proximity to the bridge structure. If the calculated risk exceeds a predefined threshold, the system activates a multi-modal alert mechanism which may include visual indicators (e.g., warning LEDs or signage), auditory alarms, or notifications transmitted via IoT communication protocols to nearby operators or connected vehicles.

The overall operational flow of the system is depicted in Fig. 1, and a summarized view of each core system component is presented in Table 1.

Table 1 Functional overview of the bridge collision avoidance system.

Subjects

Abstract

Similar content being viewed by others

Enhanced YOLO v3 for precise detection of apparent damage on bridges amidst complex backgrounds

Cable vibration and girder displacement measurement via morphology recognition and feature matching methods in bridge structure

Drone-based displacement measurement of infrastructures utilizing phase information

Introduction

Literature review

Methodology

System overview and operational workflow

Camera calibration process

Motion detection and object segmentation

Object trajectory Estimation and Spatial mapping

Collision risk assessment model

Real-Time alert generation mechanism

Hardware and software implementation details

Experimental setup and validation protocol

Experimental results

Object detection and segmentation accuracy

Trajectory prediction performance

Risk classification effectiveness

Real-Time alert generation latency

False alarm rate and system robustness

Comparative evaluation with baseline systems

Expanded comparison with recent state-of-the-art

Efficiency metrics for real-time performance

Additional metrics for comprehensive evaluation

Ablation and sensitivity analysis

Discussion

Conclusion

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links