Mobile sensing discovery of when where and why vulnerable road users break traffic rules

Li, Yan; Du, Pengcheng; Ren, Hongjin; Xi, Teng; Gao, Ke; Kang, Zhizhong; Sarvi, Majid; Zhang, Yuyang

doi:10.1038/s44333-025-00068-y

Download PDF

Article
Open access
Published: 07 January 2026

Mobile sensing discovery of when where and why vulnerable road users break traffic rules

Yan Li¹,
Pengcheng Du²,
Hongjin Ren²,
Teng Xi²,
Ke Gao²,
Zhizhong Kang¹,
Majid Sarvi³ &
…
Yuyang Zhang²

npj Sustainable Mobility and Transport volume 3, Article number: 1 (2026) Cite this article

1976 Accesses
12 Altmetric
Metrics details

Subjects

Abstract

This study performed a large-scale analysis of vulnerable road user (VRU) violation behaviors in Beijing using a novel Rotating Mobile Monitoring method with machine learning. Across four seasons, we processed 367,076 street-view images and identified 20,616 violations. Private e-bike users were the primary violators (52.9%), with not wearing a helmet being the most common infraction (11,714 instances). These behaviors exhibited clear temporal patterns, peaking in spring and during the afternoon. The built environment was a key predictor, with building and commercial activity within a 150-meter buffer correlating with multiple violation types. This research quantifies predictable risk patterns, directly linking violation hotspots to features like commercial density within a 15 minute life circle (150 m radius). This evidence enables targeted interventions for specific user groups, times, and locations, providing a data-driven path towards safer urban transport. The scalable methodology also presents a practical tool for ongoing urban safety diagnostics.

Characteristics, likelihood and challenges of road traffic injuries in China before COVID-19 and in the postpandemic era

Article Open access 03 January 2023

Influence of on-road mobile monitoring design on ultrafine particle exposure models and cognitive health inferences

Article Open access 07 March 2026

VRUCrossSafe for crossing intention prediction of vulnerable road users for improving safe crossing at intersections

Article Open access 21 May 2025

Introduction

Vulnerable road users (VRUs), including pedestrians, pedal cyclists, e-bicyclists, and motorcyclists, are particularly susceptible to traffic hazards due to their lack of external protection^1,2,3. Some definitions also include children and the elderly in this category^4,5. The World Health Organization’s 2022 Global Road Safety Report highlighted that more than half of all road traffic accident fatalities, totaling approximately 650,000, were VRUs⁶. According to the Global Status Report on Road Safety, pedestrians and cyclists together account for 26% of all traffic deaths globally, while powered two- and three-wheelers account for 28%⁶. In the European context, VRUs constitute 46% of fatalities and 53% of serious injuries across the continent^7,8. In the United States, the National Highway Traffic Safety Administration’s 2021 data indicates that VRUs accounted for 34% of traffic fatalities, with 7342 pedestrians and cyclists losing their lives⁹. The situation in China presents even greater concern, where VRU fatalities account for 70% of road traffic fatalities, with 40% attributed to two-wheeled vehicles and 25% to pedestrians¹⁰. These alarming statistics underscore the critical need to examine the factors associated with VRU accidents and implement effective measures to enhance their safety.

Among various contributing factors to VRU accidents, research has highlighted the significant role of violations, with distinct patterns across different user groups. For pedestrians, a comprehensive study found that jaywalking and crossing against signals were major contributors, accounting for 41 and 35% of pedestrian fatalities respectively¹¹. Other risky behaviors include crossing roads hastily in front of large vehicles, wearing dark clothing during low-light conditions, sudden directional changes while crossing, inadequate child supervision, and distractions, such as mobile phone use and conversations with others^12,13,14. Among cyclists, key violations include red-light running (15–25% of accidents), riding in the wrong direction (20–30%), and illegal road occupancy (10–15%), which refers to cyclists improperly using roadway spaces designated for motor vehicles or occupying lanes in ways that violate traffic regulations¹⁵. Additional risk factors encompass failure to yield, operating under the influence of alcohol or drugs¹⁶, poor concentration, inadequate distance maintenance, and lack of proper education about traffic rules¹⁷. For e-bike riders, violations, such as speeding and illegal lane changes, were involved in approximately 45% of collision incidents¹⁸. Other common risk behaviors include riding without helmets, passenger overloading, sidewalk riding, using mobile phones or headphones while riding, and continuing to ride after detecting mechanical faults^19,20. Motorcycle-related accidents often involve risky behaviors, such as speeding²¹, riding in conditions of low visibility²², fatigue-related driving²³, red light violations, pavement riding (operating motorcycles on sidewalks or pedestrian pathways where motor vehicles are prohibited), wrong-way riding²⁴, and mobile phone distractions²⁵.

External factors, particularly infrastructure, weather, and seasonal variations, significantly influence road traffic collisions (RTCs)^26,27,28. Road infrastructure elements contribute to cyclists’ risky behaviors, including surrounding land use, road length, road level, and proximity to bus stops and metro stations²⁹. Approximately 5% of RTCs occur due to adverse weather conditions³⁰, including wind³¹, temperature fluctuations³², rainfall³³, fog²⁸ and sunshine levels³⁴. Seasonal patterns also play a crucial role, with research indicating higher fatality rates in pedestrian-vehicle crashes during autumn and winter³⁵. Elderly pedestrians face increased fatal injury risks in winter compared to other seasons³⁶, while cyclist merging/diverging incidents show lower fatality rates in winter versus summer³⁷. These seasonal effects are often compounded by associated weather conditions, particularly in autumn and winter when heavy winds and dense fog, especially during nighttime and early morning, contribute to increased collision rates for both pedestrians and drivers²⁸. Given these significant impacts of external factors on road safety, understanding how VRUs behave and respond to these conditions is crucial for preventing accidents and improving road safety measures. Furthermore, as VRUs are particularly susceptible to these environmental challenges, their behavioral patterns and potential violation behaviors under various conditions require systematic investigation. This has led researchers to develop and employ various methodological approaches to analyze VRU behavior comprehensively.

Researchers have employed diverse methodological approaches to analyze VRU behavior, ranging from traditional methods to advanced technological solutions. Traditional approaches include questionnaire surveys^38,39 and observational studies⁴⁰. However, these methods have inherent limitations: observational studies are time-consuming and may be subject to observer bias⁴¹, while questionnaire surveys often face challenges, such as sample bias, psychometric distortions, technophobia and lower response rates⁴². Accident data analysis, utilizing official casualty databases, such as the UK STATS19⁴³, records from the Louisiana Department of Transportation and Development¹² and collision data from the National Observatory for Information, Training, Documentation and Studies on Road Safety (NOITDSRS) in Tunisia⁴⁴, provides valuable insights into crash patterns. However, this approach is constrained by statistical uncertainties⁴⁵ and data accessibility issues⁴⁶. Additionally, some researchers utilize traffic simulators to design a series of crash experiments to estimate the damage caused by traffic accidents to different VRUs, but the reliability of the obtained results can be problematic and numerical simulations often suffer from data deficiencies⁴⁷.

Advanced technological approaches have emerged to address these limitations. Sensor-based analysis employs various tools including GPS tracking⁴⁸, cameras⁴⁹, far infrared cameras⁵⁰ and eye-tracker cameras^13,51. While these technologies offer precise measurements, they can be costly and may be affected by environmental conditions, such as weather, light and obstacles, such as buildings, or parked vehicles⁵². Wireless communication systems, which connect vehicles and VRUs through mobile phone Wi-Fi interfaces⁵³, provide a cost-effective alternative with broad coverage, though position accuracy remains a challenge⁵⁴. Recent advancements in artificial intelligence have introduced more sophisticated analytical approaches, including machine learning techniques⁵⁵ and convolutional neural networks⁵⁶. To overcome the limitations of individual methods, researchers increasingly adopt integrated approaches, combining multiple methodologies, such as sensor data with wireless communications⁵⁷ or sensors with machine learning⁵⁸. These multi-method approaches enhance detection accuracy and compensate for the limitations of single-method strategies.

In summary, the review of current VRU research methodologies reveals several persistent challenges. Traditional methods face fundamental limitations in data credibility and collection efficiency⁵⁹, as evidenced by the observer bias in observational studies and sample bias in questionnaire surveys⁶⁰. Advanced technological solutions, while promising, encounter their own obstacles. Detection remains challenging due to the intrinsic variability in VRU appearance⁶¹ and their high mobility⁶². Data quality and quantity issues persist across all methods, manifesting in small sample sizes, time lags, and inconsistent recording standards⁶³. Furthermore, the increasingly complex urban transportation environment compounds these challenges by introducing additional variables that affect the detection and analysis of VRU risky behaviors⁶⁴. Most critically, existing studies lack comprehensive, multi-seasonal quantitative analysis of VRU violation behaviors at scale, with limited understanding of how built environment factors within specific spatial buffers influence different types of violations across diverse user groups and temporal patterns^65,66.

To address these specific methodological and knowledge gaps, this study presents a precisely focused research framework with three distinct objectives: (1) develop and validate a Rotating Mobile Monitoring (RMM) method that combines 360-degree panoramic sensing with machine learning analytics to achieve large-scale, systematic quantification of VRU violation behaviors across multiple seasons; (2) establish the optimal spatial scale (buffer zone) for analyzing built environment influences on violation behaviors and quantify the specific relationships between environmental factors and different violation types; and (3) identify actionable spatio-temporal patterns in violation behaviors that can inform targeted, context-specific intervention strategies. This focused approach specifically addresses the critical gap in quantitative, multi-dimensional analysis of VRU violations while providing a scalable methodology for urban safety diagnostics. The study’s precision lies in its systematic examination of 16 specific violation types across five VRU categories over four seasons, analyzing environmental influences at three spatial scales (100, 150, 200 m buffer zones), thereby generating evidence-based insights for developing targeted interventions that consider both user group characteristics and environmental contexts.

The remainder of this paper is organized as follows. Section 2 details the comprehensive methodology, including the study area, the RMM data collection protocol, and the machine learning and statistical methods used for analysis. Section 3 presents the detailed results, including the profile of violators and the spatio-temporal patterns of their behaviors. Section 4 discusses the implications of our findings and the advantages of our proposed methodology 3. Finally, Section 5 concludes the paper by summarizing the key contributions and suggesting directions for future research 4.

Methods

Our analytical approach is grounded in environmental behavior theory, which posits that human behavior is influenced by environmental characteristics at multiple spatial scales. Specifically, we apply the ecological systems theory to VRU behavior, where violation behaviors result from interactions between individual characteristics, immediate environment (microsystem), and broader urban context (mesosystem). The choice of buffer zones (100, 150, 200 m) aligns with pedestrian and cyclist activity space research indicating that immediate neighborhood environment (2–3 min walking/cycling distance) significantly influences travel behavior.

Mobile panoramic monitoring campaigns

To achieve our research objectives of systematically quantifying VRU violation behaviors and identifying environmental influences, our data collection required: (1) high-resolution visual imagery capable of detecting specific violation behaviors and demographic characteristics of VRUs, (2) precise GPS coordinates for spatial analysis and environmental factor correlation, (3) systematic temporal sampling across four distinct seasons to capture behavioral variations, (4) comprehensive spatial coverage of diverse urban environments, and (5) sufficient sample sizes for robust statistical analysis across multiple violation types and VRU categories.

Study area

The study area is located in the Shijingshan district of Beijing, China (Fig. 1). As both the capital and the second largest city in China, Beijing has a population of 21.8 million people as of 2022. The city has approximately 10.4 million VRUs, consisting of 3.12 million registered bicycles (including 1.85 million e-bikes) and 0.28 million registered motorcycles, while serving over 7 million pedestrian trips daily⁶⁷. Recent statistics show that VRU-related crashes account for approximately 36.8% of all traffic accidents in Beijing during 2020–2021, with pedestrians and cyclists being the most vulnerable groups. The Shijingshan district, located in western Beijing, has gained particular significance following the 2022 Winter Olympics. In alignment with the post-Olympic legacy, the district has embraced sustainable urban development, particularly emphasizing green transportation initiatives⁶⁸. This focus on sustainable transportation makes the district an appropriate setting for studying VRU behavior.

Within Shijingshan district, we selected a study area of 12.35 km² that captures the district’s representative characteristics (Fig. 1). The area is strategically positioned adjacent to Beijing’s West 5th Ring Road and encompasses six street blocks. The road network within the study area spans 54.79 km, comprising primary, secondary, and minor roads as illustrated in the data collection routes shown in Fig. 1. This area features diverse urban landscapes, including core commercial zones, educational institutions (including universities), residential communities, public parks, and government offices. These varied land use patterns and road network characteristics make it an ideal location for studying VRU behavior across different urban contexts.

VRUs and violation behaviors

This study focuses on examining five types of VRUs (including pedestrians, cyclists, e-bike users, tricyclists, and motorcyclists) and sixteen violation behaviors (summarized in Table 1 and illustrated in Fig. 2). These violations were selected based on three criteria: first, they have been mentioned in published articles and have been proven to be related to violations but have not been examined systematically and quantitatively; second they can be visually identifiable through image-based detection; and finally they represent common violation patterns within the study area.

Table 1 Violation behaviors of VRUs

Full size table

For pedestrians, the study examines three violation behaviors, including red-light running, jaywalking, and walking in non-motorized vehicle lanes. For cyclists and e-bike users, the study investigates eleven violations total: six primary violations, including riding without helmets, illegal passenger transport, improper lane usage (riding on sidewalks or motorways), wrong-way riding, red-light running, and improper bike parking on sidewalks, as well as five distracted riding behaviors, such as mobile phone use, parallel riding, wearing headphones, making calls, and conversing while riding. For motorcyclists, the research explores four behaviors, including helmet use compliance, wrong-way riding, red-light running violations, and improper parking behaviors. Finally, the study investigates four tricycle-related violations, including illegal road use, wrong-way riding, red-light running, and improper parking on sidewalks.

Mobile data collection campaigns

Characterized by high spatial, temporal, and seasonal heterogeneity, VRU risk behaviors dramatically change from time to time even at the same geographical location. This study utilized an RMM method⁶⁹ to monitor VRUs’ behaviors, which combines the advantages of stationary and mobile monitoring methods by rotating stationary sampling sites during mobile monitoring campaigns. Mobile routes covered every street in the study area, and ArcMap’s Find Shortest Path tool was used to calculate and obtain minimum routes with full coverage for consistent navigation in each data collection campaign. This study identified 25 intersections as stationary observation points along the 13 selected roads. This stationary approach allows us to observe how VRU risk behaviors change temporally in the same space, as well as across different seasons. To determine the observation duration, at intersections equipped with traffic signals, observers waited at least one full signal cycle from red to green, approximately 120 s. To investigate the impact of season and time periods on VRU behaviors, January, May, August, and November were selected to represent spring, summer, fall, and winter, respectively, and two to three days in each season were selected for data collection. Two collection periods-the noon period from 12:00 to 14:00 and the afternoon period from 17:00 to 19:00 were selected as the observation periods. It should be noted that our observation duration of one signal cycle per intersection visit, while shorter than conventional traffic studies, was compensated by the systematic repetition of observations across multiple seasons and the comprehensive spatial coverage of 25 intersections. Each intersection was observed multiple times throughout the study period, and the mobile monitoring component captured VRU behaviors along road segments between intersections, providing a more comprehensive spatial understanding of violation patterns than would be achieved through intensive single-intersection studies alone.

Data collection periods were selected based on systematic criteria: (1) seasonal representation - January, May, August, and November represent distinct climatic conditions in Beijing (winter, spring, summer, autumn), (2) temperature alignment - selected dates matched seasonal average temperatures to avoid weather anomalies, (3) traffic pattern consistency—weekdays and consistent time periods (12:00–14:00, 17:00–19:00) ensure comparable traffic conditions, and (4) operational feasibility—dates avoided extreme weather events that would compromise data quality or collector safety. Data collection was conducted over four weekdays in November 2022, and January, May, and August of 2023. During the RMM campaign, an e-bike was deployed to carry professional monitoring devices to monitor a large geographical area in a short time (see Fig. 3). A panoramic camera, ‘Insta360’, was fixed to the rider’s helmet and used to collect street view videos. The Footpath GPS recording application on the smartphone was also activated to navigate and record the GPS tracks of the e-bike. The clocks of all devices were synchronized with the smartphone to ensure that the timeline of the data collected by each device remained consistent.

To obtain representative data for dynamic traffic flow on the selected collection dates, we chose dates that were representative of the entire collection period. A total of 11 days were selected with two to three for each of the four seasons. One day was missing in summer due to a data recording issue and missing dataset. The weather conditions were predominantly sunny, with two days of light fog. Rainy days were avoided to prevent impacts on traffic volume and camera data collection. Regarding temperature, the temperature of the selected collection days matched the average temperature of the corresponding season. This approach was taken to avoid extreme cold or heat, which could result in significant discrepancies between the temperature of the collection day and the average temperature of the corresponding season. Each data collection day involved two distinct periods: a midday period (approximately 12:00–14:00) and an afternoon period (approximately 17:00–19:00), as detailed in Table 2. The “half-hour window” refers to the consistency of start times for these periods across different collection days (e.g., always starting the midday period between 11:45–12:15), not the total duration of data collection. As shown in Table 2, the actual total duration of data collection per day ranged from 2 h 18 min to 2 h 59 min, combining both collection periods. This timing consistency ensures that traffic patterns and VRU behaviors observed during comparable time periods across different days and seasons, while the total duration provides sufficient data for robust analysis.

Table 2 Data collection campaigns

Full size table

To ensure seasonal representativeness within resource constraints, data collection days were carefully selected to be representative of their respective seasons in terms of weather and temperature conditions. As shown in Table 2, our collection temperatures closely aligned with Beijing’s seasonal averages for the collection year: spring collection days (14 °C–32 °C) fell within and represented the typical spring range (14 °C–28 °C), summer collection days (23 °C–34 °C) aligned well with normal summer temperatures (21°C–33 °C), autumn collection days (-2 °C–13 °C) captured typical autumn conditions (1 °C–13 °C), and winter collection days (-8 °C–7 °C) represented characteristic winter temperatures (-4 °C–8 °C). Additionally, we avoided extreme weather conditions, such as heavy rain, snow, or unusually severe temperatures that could produce atypical VRU behaviors unrepresentative of normal seasonal patterns. This careful selection strategy helps ensure that our seasonal comparisons reflect genuine seasonal behavioral differences rather than weather-related anomalies.

Data pre-processing

To simplify video data processing while retaining its fundamental activity information, extracting image frames is an effective method for obtaining key information about the video data⁷⁰. Images were extracted every 10 frames from videos with a frame rate of 30 fps and every 20 frames from videos with a frame rate of 60 fps, resulting in 3 frames per second. The camera’s storage limitations resulted in a maximum recording time of 29 minutes per video. Over a period of 11 acquisition days, distributed across four seasons, a total of 101 videos were captured. Utilizing the aforementioned frame extraction approach, a total of 367,076 images were obtained.

The extracted frames were georeferenced by matching the timestamp of frames with corresponding GPS data, in order to generate a GIS dataset containing time, frame name, and GPS locations. Subsequently, sampling points were extracted along the road segments every 10 meters. Finally, the locations of each sampling point were spatially joined with georeferenced image frames to identify the nearest images.

Spatial accuracy was ensured through a systematic validation and correction protocol combining automated assessment and manual verification. The process included: (1) automated GPS accuracy assessment using positioning error statistics (mean error < 3 meters, 95% confidence interval < 5 meters), (2) visual comparison with Baidu Street View imagery at 200 randomly selected points to verify spatial correspondence using assessment criteria of landmark alignment, directional consistency, and temporal matching, and (3) manual correction when discrepancies exceeded 5 meters. Manual correction procedures followed standardized protocols: discrepancies were identified when georeferenced images showed different landmarks or viewing angles, corrections used ArcGIS editing tools to adjust GPS coordinates based on visual feature matching, all corrections were validated by independent reviewer, and changes were documented with original coordinates, corrected coordinates, and justification rationale.

This study received ethical approval from Beijing University of Science and Technology Institutional Review Board under protocol number 2024-3-109 prior to data collection. Under this ethical oversight, privacy protection measures were implemented including immediate anonymization during analysis, with personally identifiable features systematically blurred during processing. Access to raw video footage was restricted to authorized research personnel only, with secure data storage and retention protocols stipulating permanent deletion of raw footage within 24 months of study completion. All analysis focuses exclusively on aggregate behavioral patterns rather than individual identification, ensuring privacy protection while contributing to traffic safety research. Furthermore, the individual depicted in Fig. 3 is one of the authors of this manuscript and has provided written informed consent for their image to be published herein.

Identification of VRUs

To identify the traffic volume of diverse VRUs, seven types of VRUs were selected and trained in a machine learning model to automatically detect VRUs, including pedestrians, cyclists, e-bike riders, delivery e-bike users, tricycle riders, delivery tricycle users, and motorcyclists. Due to the prevalence of delivery services worldwide and frequently reported violations, delivery personnel were specifically highlighted for violation analysis.

Images were preprocessed by resizing to 1024×1024 pixels and normalization. Manual annotation used LabelMe software⁷¹ with bounding box coordinates defining each VRU instance, ensuring balanced representation across all seven VRU types under diverse lighting conditions and viewing angles.

To train the machine learning model, image frames recorded during the lunchtime period on June 4th were selected and annotated using LabelMe object detection software. LabelMe is a web-based, open-source graphical image annotation tool that allowed our researchers to manually draw rectangular bounding boxes around each VRU and assign a class label (e.g., pedestrian, cyclist), creating the necessary ground-truth dataset for supervised machine learning. A total of 623 images were annotated, with at least one subject present in 479 of these images. Model training was conducted on the aforementioned 479 images and corresponding annotation files, which were divided into training and validation datasets in accordance with an 80% to 20% ratio. The YOLOv8n model was configured with input resolution 1024×1024 pixels, batch size 16, learning rate 0.01 with cosine annealing, IoU threshold 0.7, confidence threshold 0.25, and 100 training epochs with early stopping. YOLOv8n (‘You Only Look Once’ version 8-nano) is a state-of-the-art, real-time object detection model based on deep convolutional neural networks⁷². It is designed for fast and accurate detection, making it well-suited for processing the large volume of images in this study. The model was trained on our annotated dataset to learn to automatically identify and classify the seven different types of VRUs from the full set of street-view images. Upon completion of the training phase, a machine learning model for VRU detection and classification was generated, achieving an accuracy of 77.09%.

Identification of violation behaviors

To identify risky behaviors in the recorded images, five volunteers were recruited and trained to follow consistent criteria (Table 1). Detailed annotation protocols were developed with specific operational definitions for each violation type: “not wearing helmet” required clear visual confirmation of uncovered head, while “red-light running” was defined as entering intersections during red signal phase. Before systematic annotation, all volunteers completed reliability testing on 100 overlapping images, achieving substantial inter-rater agreement: overall Cohen’s κ = 0.87 (95% CI: 0.82-0.92), with behavior-specific values ranging from κ = 0.79 (parallel riding) to κ = 0.93 (not wearing helmet). Weekly consistency checks maintained reliability above κ = 0.80 throughout the annotation period.

Several training sessions and discussions were organized initially and throughout the annotation process. Training included theoretical instruction, practice sessions with feedback, and consensus protocols for ambiguous cases. Disagreements were resolved through structured discussion, with unresolvable cases excluded (<2% of observations). The annotation was reviewed and validated on a weekly basis for consistency. The volunteers spent 25 hours per week annotating data for each seasonal day, and the annotation process lasted approximately 8 weeks. Each volunteer used DarkLabel object tracking software to annotate violation behaviors, employing a tracking annotation method to ensure that subjects exhibiting risky behaviors were annotated from the time they appeared in the captured video until they disappeared. DarkLabel is a specialized video annotation tool designed for object tracking⁷³. This functionality was critical for our study as it enabled annotators to assign a unique identity to an individual VRU in one frame and then efficiently track that same individual across subsequent frames. This ensures that a single, continuous violation behavior performed by one person is counted as one instance, preventing overcounting from multiple detections of the same event. Each behavior exhibited by each subject was assigned a unique number during annotation, with identical behaviors being aggregated to calculate the total number of occurrences for each behavior type.

Identification of population structure

To develop machine learning models capable of recognizing age and gender, we initially selected images captured during lunchtime on May 26. Using the pre-trained machine learning model designed for detecting VRUs, we identified subjects in the images. The identification process involved running the VRU detection model to obtain bounding box coordinates, cropping individual subjects with padding to capture complete features, filtering poor quality images, and manual quality control for visible demographic characteristics. Subsequently, the identified subjects were cropped using bounding box coordinates, yielding a dataset of 2020 images. These images were then categorized for further analysis.

For age recognition, we established four distinct age categories based on both developmental characteristics and traffic safety literature: children (0-17 years), youth (18-35 years), middle-aged (36-60 years), and elderly (61+ years). These age boundaries were selected to align with cognitive and physical development stages relevant to traffic behavior^4,5, as well as age-related risk patterns identified in existing VRU safety research^35,36. The 18-year threshold distinguishes between minors and adults with full legal driving responsibilities, while the 61+ category captures age-related mobility and sensory changes that affect traffic behavior³⁹. Ground truth labeling of 200 randomly selected images was performed by three trained annotators who classified subjects based on visible physical characteristics, achieving high inter-annotator agreement (κ = 0.89). The age classification relies on computer vision analysis of facial and bodily characteristics to categorize individuals into broad age groups relevant for traffic safety analysis⁷⁴.

For gender classification, the images were labeled as male or female through manual annotation based on visual characteristics including clothing, hairstyle, body shape, and movement patterns. The same three-annotator approach was used with high inter-annotator agreement (κ = 0.92). The dataset was then split into training and validation sets using an 80:20 ratio.

As with the VRU type identification, the YOLOv8n model was used for its efficiency and high performance in classification tasks. For this purpose, two separate models were trained: one to classify the identified VRUs into four age categories and another to classify them by gender. The training process utilized YOLOv8n with input resolution 640×640 pixels, batch size 32, learning rate 0.001 with cosine annealing, and 100 training epochs with early stopping. Upon completion of training, two machine learning models were developed: one for age detection and one for gender detection. The age recognition model achieved an accuracy of 98.89%, while the gender recognition model achieved an accuracy of 99.79%. These accuracy rates reflect the model’s ability to correctly classify the broad age categories rather than precise age estimation, which is appropriate for the traffic safety behavioral analysis objectives of this study.

The different dataset sizes reflect our sequential analysis pipeline: VRU detection required 623 full scene images to learn object detection, while demographic classification used 2,020 individual VRU crops to learn classification features. The larger demographic dataset was necessary because individual subjects provide more focused training samples and greater diversity for capturing demographic variations.

Multi-scale statistical methods

This study employed multi-dimensional statistical analysis methods to systematically analyze traffic users’ violations. The research began with descriptive statistics, including the distribution of user types, gender and age, as well as violation patterns across different seasons and time periods. Subsequently, through cross-analysis methods, the study delved into the demographic characteristics of different traffic user groups, using stacked bar charts to display the gender-age composition of various traffic participants.

For spatial-temporal pattern analysis, the study constructed two-dimensional heatmaps based on season-time period combinations, revealing the temporal-spatial distribution patterns of violations from both absolute quantity and relative proportion perpectives, and identifying high-frequency periods for various types of violations. To better understand the factors influencing violations, the study further employed conditional probability analysis methods, calculating the conditional probabilities of three feature combinations: gender-age, subject-gender, and subject-age, and visualizing the top 10 most significant feature combinations. All statistical analyses and visualizations were implemented using Python, utilizing matplotlib and seaborn libraries for plotting, with carefully designed color schemes and chart layouts to ensure clear presentation of research results.

The study employed spatial autocorrelation analysis to investigate the spatial distribution characteristics of violations, utilizing both global and local indicators for multi-scale examination. Global Moran’s I index was used to assess overall spatial clustering, measuring whether violation patterns exhibit spatial dependence across the study area⁷⁵. This index ranges from [-1,1], with positive values indicating positive spatial correlation (clustering of similar values), negative values indicating negative correlation (clustering of dissimilar values), and zero indicating random spatial distribution.

To reveal local spatial pattern heterogeneity, we calculated Local Indicators of Spatial Association (LISA), which decomposes global spatial autocorrelation into location-specific components, enabling identification of statistically significant local clusters and outliers^75,76. Spatial relationships were defined using queen contiguity (locations sharing edges or vertices), with the weight matrix row-standardized to ensure comparability across locations with different numbers of neighbors.

LISA analysis identifies four distinct spatial association patterns: (1) High-High clusters (areas with high violation rates surrounded by areas with similarly high rates), (2) Low-Low clusters (areas with low violation rates surrounded by areas with similarly low rates), (3) High-Low outliers (areas with high violation rates surrounded by areas with low rates), and (4) Low-High outliers (areas with low violation rates surrounded by areas with high rates). Statistical significance was determined using Monte Carlo permutation tests with 999 repetitions, where violation values were randomly reassigned while maintaining spatial structure. Significance was set at p < 0.05⁷⁷.

LISA cluster maps were constructed displaying only statistically significant spatial associations, with different colors representing each pattern: red for High-High clusters, blue for Low-Low clusters, pink for High-Low outliers, light blue for Low-High outliers, and gray for non-significant locations. This analysis was conducted for each of the 16 violation types using 1,417 sampling points across the study area. All spatial analyses were implemented using Python’s PySAL library⁷⁸, providing detailed spatial intelligence for targeted intervention strategies.

Machine Learning for Environmental Factor Analysis

This research employed machine learning methods, combining XGBoost regression models and SHAP (SHapley Additive exPlanations) analysis to explore relationships between environmental variables and behavioral patterns. In the data preprocessing stage, violation indicators and traffic variables were averaged across different seasons and time periods by location to create location-specific features, and all environmental variables were standardized, with categorical variables like seasons and time periods being one-hot encoded. The modeling phase used XGBoost regression with key parameters including 300 estimators, 0.01 learning rate, maximum tree depth of 4, minimum child weight of 3, 0.6 sample and column sampling ratios, and gamma value of 0.1, along with a 95% training and 5% testing split ratio, using early stopping (patience=10) to prevent overfitting. Three types of visualizations were generated for each target behavioral variable: SHAP summary plots (showing the top 15 most influential features and their impact direction), SHAP bar plots (quantifying the absolute importance of the top 15 features), and SHAP dependence plots (examining relationships between the most important features and target variables). Environmental variable selection covered multiple urban elements, including road characteristics (density, width, coverage within 150 m and 200 m buffers), building characteristics (density, floor area ratio within 100 m/150 m/200 m buffers), POI distribution (100 m/150 m/200 m buffers), green space coverage (100 m/150 m/200 m buffers), population density (100 m/150 m/200 m buffers), transit accessibility (bus stops within 100 m/150 m/200 m buffers), and commercial facility density (100 m/150 m/200 m buffers), with model performance evaluated through R-squared values on both training and testing sets⁷⁹.

Results

Profile of violators and violations

This study recorded 20,616 cases of risky behaviors by VRUs (see Fig. 4). Analysis of violaters revealed that e-bike users were the main group of violators, accounting for 52.9% of the total, followed by pedestrians (18.5%) and food delivery riders on e-bikes (17.5%).

**Fig. 4: Basic Statistics of VRU Violations.**

This finding highlights the unique position of e-bikes in urban transport; their speed and utility, particularly for commuting and the gig economy, may lead riders to adopt more aggressive behaviors compared to traditional cyclists, while not being subject to the same licensing and enforcement as motorcyclists. Regarding the demographic characteristics of the cases, male violators accounted for 75.3%, significantly higher than females; ages were mainly concentrated in the middle-aged group, accounting for 78.4%.

This demographic skew towards middle-aged males may reflect their higher participation rates in commuting and delivery services, and potentially a greater propensity for risk-taking behavior in this cohort, as documented in broader traffic safety literature. In terms of temporal distribution patterns, violations showed clear seasonal and temporal differences: spring was the peak period for violations (8305 cases), and the afternoon peak period (11499 cases) had significantly more violations than the midday period (9,117 cases). Regarding types of behaviors, not wearing helmets (11714 cases) was the most common violation, followed by riding in motor vehicle lanes (2057 cases) and illegally carrying passengers (1828 cases). These findings reveal distinct group characteristics and spatio-temporal distribution patterns in the violation behaviors of VRUs, providing important empirical evidence for developing targeted traffic safety management measures.

A multi-dimensional analysis of traffic violations by VRUs revealed significant demographic characteristics and spatio-temporal distribution patterns in violation behaviors (see Fig. 5). Regarding the demographic characteristics of violators, middle-aged males were the primary violating demographic among e-bike users, with 5,502 violations, while middle-aged females and young people had relatively fewer violations. Violations by food delivery e-bike riders were almost entirely concentrated among middle-aged males, which is highly consistent with the occupational characteristics of delivery riders. In terms of spatio-temporal distribution, spring afternoons were the peak period for various violations, particularly for behaviors, such as not wearing helmets and riding in motor vehicle lanes. The research also indicated that certain violations had distinct temporal characteristics, such as red-light running being more concentrated in autumn mornings, while riding in motor vehicle lanes peaked in spring afternoons.

**Fig. 5: Multi-dimensional Analysis of VRU Violation Patterns.**

In the analysis of violation combinations, “electric bike-male-middle aged-no helmet” was the most typical violation pattern with 3781 cases, followed by the combinations of “electric bike-female-middle aged-no helmet” and “delivery electric bike-male-middle aged-no helmet”. This multidimensional analysis of violation characteristics revealed typical violation patterns among different demographic groups during different time periods, providing important empirical evidence for developing targeted traffic management measures. For delivery riders specifically, besides the common issue of not wearing helmets, riding in motor vehicle lanes was also a prominent safety hazard. These findings not only deepened the understanding of violation behaviors among VRUs but also provided a scientific basis for implementing traffic safety management measures tailored to specific locations and times.

Spatial distribution patterns

In our study of spatial distribution patterns of traffic violations by VRUs, we discovered significant spatial patterns through LISA cluster analysis. By examining the Moran’s I values for different types of violations, we gained a deeper understanding of their spatial autocorrelation characteristics (see Fig. 6). Fig. 6 displays the Local Indicators of Spatial Association (LISA) cluster maps for different types of VRU violations across the study area. Each map shows statistically significant spatial clusters and outliers for specific violation types, with different colors representing: red areas indicate High-High clusters (violation hotspots surrounded by similar hotspots), blue areas show Low-Low clusters (areas with consistently low violation rates), pink areas represent High-Low outliers (isolated hotspots), and light blue areas indicate Low-High outliers (isolated low-violation areas). Gray areas show locations with no statistically significant spatial association.

**Fig. 6: LISA clustering analysis results for VRU violation behaviors.**

The results indicate that different types of violations exhibit distinct spatial distribution patterns. Among these, illegal bike parking behavior demonstrated the strongest spatial autocorrelation, with a Moran’s I value of 0.228, indicating clear spatial clustering of this type of violation. This clustering may be related to insufficient parking facilities in specific areas or local differences in infrastructure provision and management practices.

Regarding safety protection behavior, not wearing helmets showed moderate spatial autocorrelation (I = 0.149). This spatial clustering might reflect micro-environmental influences, such as variations in local enforcement intensity, neighborhood-level social norms, proximity to helmet retail outlets, and localized traffic safety education efforts within the study area, rather than broad regional cultural patterns.

This strong clustering is not random; it provides a clear behavioral insight that such violations are highly location-dependent, likely occurring in areas with high demand but insufficient or poorly located parking infrastructure, such as near commercial centers or public transit stations. This finding offers a direct and actionable application: the generated LISA cluster maps can serve as a diagnostic tool for urban planners to identify precise locations where the installation of new bicycle parking facilities is most needed.

Traffic violations, such as running red lights (I = 0.056), wrong-way riding (I = 0.051), and riding side by side (I = 0.031) showed relatively weak spatial autocorrelation. This suggests these violations are less influenced by specific site characteristics and are more likely driven by individual behavioral factors like impatience, risk perception, or situational decisions. This distinction is critical for policy; while illegal parking can be addressed with targeted infrastructure changes, reducing red-light running may require broader-based interventions, such as enhanced enforcement, public awareness campaigns, or improved traffic signal timing to reduce wait times.

Distracted riding behaviors, including using phones while riding (I = 0.051) and wearing headphones (I = 0.034), also showed weak spatial autocorrelation. This distribution pattern indicates that such behaviors are primarily influenced by personal usage habits and have low correlation with location factors.

Analysis of riding in motor vehicle lanes (I = 0.085) and riding on sidewalks (I = 0.084) revealed that these road occupation behaviors showed slight positive spatial autocorrelation. This spatial clustering pattern may be related to road design, traffic facility layout, and regional traffic flow characteristics.

Overall, the LISA cluster analysis results (Fig. 6) reveal distinct spatial clustering patterns for different violation types. Illegal bike parking demonstrates the strongest spatial clustering, with clearly defined High-High clusters concentrated around commercial areas and transit nodes, indicating systematic parking problems in these locations. Not wearing helmets shows moderate spatial clustering with several High-High clusters distributed across the study area, suggesting localized areas where helmet non-compliance is particularly prevalent. In contrast, violations, such as red-light running, wrong-way riding, and riding side by side show weaker spatial autocorrelation with fewer significant clusters, indicating these behaviors are more randomly distributed across space and likely influenced by individual behavioral factors rather than location-specific environmental characteristics.

This spatial analysis reveals significant differences in the spatial distribution patterns of traffic violations by VRUs within the local study area. Some behaviors show clear localized clustering characteristics, while others display more random distribution patterns. These findings provide an important scientific basis for developing differentiated management strategies at the neighborhood and street level and optimizing urban traffic environments. Future management measures should fully consider this micro-spatial heterogeneity and implement corresponding management and improvement measures according to local conditions.

Environmental Influencing factors on behaviors

Statistical data on environmental factors surrounding VRUs were collected across different buffer zones (100 meters, 150 meters, and 200 meters), with 1417 sampling points in each zone (Table 3). Analysis of these environmental factors reveals: the study area has a high density of built environment features, especially within smaller buffer zones; public facilities show distinct spatial clustering characteristics with improved accessibility over larger buffer zones; population distribution is relatively uniform, though building density shows some spatial variation; and green spaces are evenly distributed, demonstrating good urban planning characteristics.

Table 3 Statistics of influencing factors

Full size table

Specifically, regarding roads and building environments, road density remains relatively stable across the three buffer zones, with average values between 0.044-0.048 m/m²; building density decreases as the buffer zone expands, dropping from 0.260 in the 100-meter zone to 0.198 in the 200-meter zone, indicating higher building density in the urban core; floor area ratio remains relatively stable across different buffer zones, with average values between 1.166-1.216. Regarding green spaces, green space density remains relatively stable across the three buffer zones, with average values between 0.276-0.290 m²/m², indicating relatively uniform distribution of green spaces in the study area. In terms of population distribution, population density slightly increases as the buffer zone expands, from 0.630 people/m² to 0.653 people/m², with standard deviation decreasing as the buffer zone expands, suggesting more uniform population distribution over larger areas. Regarding public facility accessibility, the number of Points of Interest (POIs) increases significantly as the buffer zone expands, from an average of 37.111 (100 meters) to 142.074 (200 meters); specifically, retail storefronts show a similar trend, increasing from 25.081 to 97.286; and the number of bus stops increases from an average of 1.033 to 1.290. In terms of data coverage, most indicators have nearly complete samples (1417/1417); only bus stop data shows lower coverage across different buffer zones, particularly in the 100-meter buffer zone with only 567 valid samples.

The analysis of SHAP values from geographically weighted machine learning models reveals complex relationships between environmental features and the model’s prediction of risky behaviors (Fig. 7). To illustrate the interpretation of SHAP analysis, we provide specific examples from the sub-figures: In the ‘not wearing helmets’ SHAP plot, many points for the ‘season_spring’ feature are distributed to the right of 0 (positive SHAP values), indicating that the spring season factor has a positive influence on model prediction, i.e., it increases the probability of not wearing helmet behavior occurring. Similarly, for ‘building_density’, the concentration of red points (high building density) on the positive side demonstrates that areas with higher building density consistently increase helmet non-compliance likelihood. Conversely, in the ‘car_flow’ feature, while most points show positive influence, the presence of some blue points (low traffic flow) on the negative side indicates that very low traffic areas may actually decrease violation probability.

**Fig. 7: Environmental Factors Influencing VRU Violation Behaviors.**

For distraction behaviors, such as mobile phone use, the SHAP plots show ‘car_flow’ as the most influential factor, with predominantly red points distributed on the positive side, indicating that high traffic flow areas significantly increase distraction behavior occurrence. The ‘commercial_150 m’ feature shows a similar pattern, where red points (high commercial density) cluster on the positive side, demonstrating that commercial areas within 150 m consistently promote distraction behaviors.

Among these, distraction behaviors (phone calls, conversations while riding, mobile phone use, and wearing headphones) are most significantly affected by car_flow, demonstrating strong positive associations with building_density and commercial_150 m. Among built environment variables, metrics within a 150 m radius demonstrate optimal predictive efficacy, indicating that distraction behaviors occur more frequently in areas with heavy traffic and commercial activity.

Behaviors involving rule violations (jaywalking, riding side by side, running red lights, and riding against traffic) are predominantly influenced by road_width, road_density, and car_flow. Behaviors involving improper road space usage (riding on sidewalks, riding on motorized vehicle lanes, and walking on non-motorized vehicle lanes) show strong spatial dependence, with road_width and building_density having the most significant impact and demonstrating clear interactive effects among different traffic flow patterns. Illegal bike parking behaviors are primarily influenced by building_density and commercial_150m. Behaviors reflecting insufficient safety protection (not wearing helmets, not stopping at crosswalks) display high sensitivity to car_flow, building_density, and road_density. Finally, both overloading behaviors (carrying passengers) and unauthorized road usage behaviors exhibit significant dependence on car_flow, while also showing strong associations with building_density and commercial_150m.

The SHAP value analysis further reveals several universal patterns. First, regarding spatial influence, environmental variables within a 150m radius (commercial_150m, poi_150m, etc.) consistently demonstrate the strongest predictive power, with building_density and commercial_density being the most universal influencing factors, while road_width and road_density significantly impact specific behavior types. Second, concerning traffic flow effects, car_flow significantly influences most behaviors, with clear interactive effects observed among different transportation modes. Third, regarding built environment impacts, high commercial_density areas serve as hotspots for multiple violation behaviors, while building_density shows a consistent positive correlation with violation behaviors.

These findings illuminate the spatial distribution and influence mechanisms of VRUs’ risk behaviors, suggesting that traffic management policies need to consider environmental variability, particularly through comprehensive governance in high commercial_density areas, optimization of environmental factors within a 150m radius, and coordination between traffic flow management and spatial governance. These findings provide important quantitative evidence for developing refined traffic management strategies and optimizing cycling environments, while also revealing complex relationships between risk behaviors and the urban built environment.

Discussions

In terms of data collection, our approach offers several advantages over traditional data sources. Traditional social observation methods^80,81 inherently contain human observational biases, while questionnaire data^82,83 suffers from self-reporting limitations and subjective influences. Collision data^84,85 presents privacy concerns and challenges with data imbalance. Our study addresses these limitations by employing video analysis as the primary data source, which offers significant advantages over these traditional methods⁸⁶.

Our implementation of the RMM method for road data collection represents a substantial improvement over stationary monitoring approaches^87,88,89,90, which typically require a large number of observation points and expensive video recording equipment. While traditional methods incur significant time and labor costs, particularly for large-scale studies, the RMM approach proves more economically efficient by reducing both personnel and equipment requirements while enabling broader coverage of the study area through a combination of mobile and stationary monitoring.

Furthermore, our use of 360-degree panoramic cameras for road video data collection offers comprehensive coverage compared to forward-facing cameras^87,90,91, providing more complete road condition information while minimizing equipment needs. Given that current research rarely addresses seasonal variations and their relationship with VRU risk behaviors^92,93, we conducted data collection across all four seasons, conducting detailed analysis and cross-seasonal comparisons to comprehensively study VRU behavior patterns throughout the year. To optimize collection efficiency, we followed the established practice of collecting data during peak pedestrian flow periods^87,94, focusing on both midday and evening peak traffic periods.

In our comprehensive analysis of VRU risk behaviors, we built upon previous work to develop a more complete analytical framework that includes spatial dimensions, violator characteristic dimensions, risk behavior indicators, and covariate indicators.

Regarding risk behavior indicators, we found that previous research did not provide a comprehensive list of risk behaviors. For example, some studies focused on specific infractions like not wearing safety helmets and using mobile devices while in traffic^19,25. Others examined violations, such as running red lights⁹⁰, wrong-way riding, and cycling on sidewalks¹⁹. Some research has taken a broader approach by examining multiple behaviors at once, including wrong-way driving, running red lights, and jaywalking. Building on previous research, we expanded the range of potential VRU risk behaviors to 16 types, aiming to cover VRU risk behavior categories as comprehensively as possible.

In terms of spatial dimensions, we found that many researchers focused on VRU violations at intersections^95,96, while fewer studies examined road segments⁸⁹, which clearly cannot provide a complete picture. In our study, we covered both intersections and road segments, allowing us to draw more comprehensive conclusions.

When analyzing the correlational relationship between risk behaviors and VRUs, we found that not all types of risk behaviors are related solely to VRU groups; some are also related to infrastructure. Current research tends to focus either on VRU behavior^97,98 or infrastructure conditions^99,100, with few studies considering both factors comprehensively. Researchers have reached a consensus that covariate factors are associated with VRU risk behaviors. Building on previous research, we enriched the covariate indicators to include road density, building density, green space density, population density, floor area ratio, storefront density, number of bus stops, and POI density, thereby conducting a more comprehensive analysis of how covariates are associated with VRU risk behaviors.

While this study provides valuable insights, several limitations must be acknowledged. First, the data collection was conducted within a specific 12.35 km² area in Beijing’s Shijingshan district, which may limit the direct generalizability of our findings to other urban contexts. However, our study area selection was strategically designed to maximize representativeness, as it encompasses a mix of land uses (commercial, residential, educational) and road types that are common in many modern cities. To assess the generalizability of our behavioral findings within a limited scope, we conducted a comparative analysis in another typical urban built-up area of Beijing: Xueyuanlu Township in Haidian District⁶⁵. This study area is a representative urban district with diverse land uses, mixing residential areas, commercial zones, and multiple university campuses. The data was collected in the spring of 2022 during daytime (14:52–17:46). That study similarly revealed the dominance of electric bicycles in non-motorized traffic (accounting for 72.1% of all riders) and their associated risky behaviors (such as a helmet-wearing rate of only 32.3%). The fact that two independent studies in different, representative urban districts both point to similar traffic behavior patterns suggests that the key challenges identified in our Shijingshan study—particularly the safety issues associated with e-bikes—are not isolated phenomena but are likely a prevalent pattern in a high-density city like Beijing. This cross-regional consistency enhances, to some extent, the external validity of our conclusions. Future research should build on this by conducting systematic, multi-site studies across different regions to develop more universally applicable intervention strategies.

Second, regarding data collection duration, our stationary observation approach recorded data for one signal cycle (approximately 120 seconds) at each intersection during each visit, which is shorter than standard traffic engineering practices that typically involve 2-3 hours per day over 3-7 days per intersection. This limitation may affect the capture of complete traffic dynamics and behavioral variations at individual intersections. However, our RMM approach was designed to balance this limitation through broader spatial-temporal coverage: we collected data at 25 intersections across 11 days spanning four seasons, with each intersection visited multiple times across different seasons and time periods, resulting in a total dataset of 367,076 images. This approach prioritized systematic multi-seasonal and multi-location analysis over intensive single-location observation, enabling us to capture seasonal and spatial variations in VRU behaviors that would not be possible with traditional intensive single-site approaches.

The trade-off between observation duration per site and spatial-temporal coverage represents a methodological choice aligned with our research objectives of understanding broad patterns across seasons and locations rather than intensive micro-level traffic dynamics at individual intersections. Nevertheless, future studies could benefit from combining both approaches: intensive multi-day observations at selected key intersections to validate the representativeness of shorter observation periods, alongside the broader spatial-temporal coverage achieved through the RMM method. Additionally, the development of automated detection systems could enable continuous long-term monitoring that overcomes the resource constraints that currently necessitate such trade-offs between depth and breadth of data collection.

Regarding seasonal data collection, we acknowledge that our sampling of 2-3 days per season, while carefully selected to represent typical seasonal conditions, may not capture the full spectrum of behavioral variations that occur throughout each season. While our approach of selecting representative days with temperatures matching seasonal averages minimizes weather-related confounding factors, longer data collection periods per season would provide more robust seasonal representation and better capture within-season variations. Future studies should consider extending the temporal sampling within each season, such as collecting data over multiple weeks per season or implementing systematic sampling throughout each seasonal period, to strengthen the reliability of seasonal behavioral patterns and account for potential week-to-week variations within seasons.

It is important to note that our spatial clustering interpretations are limited to micro-scale environmental and social variations within a single district rather than broad regional differences. The observed spatial patterns in violation behaviors likely reflect very localized influences, such as street-level infrastructure variations, neighborhood-specific enforcement practices, local business environments, and micro-community social dynamics. True regional cultural differences would require comparative analysis across multiple districts or cities. Future research should distinguish between these micro-environmental clustering effects and broader regional or cultural patterns through multi-district comparative studies.

Conclusions

This study presents a comprehensive analysis of VRU traffic violation behaviors using innovative mobile sensing techniques and multi-scale analytical methods. Our findings reveal distinct patterns in violation behaviors across different seasons, times of day, and spatial locations, demonstrating the complex interaction between human behavior and urban environments.

The study makes several key contributions. Methodologically, it introduces an efficient RMM approach for collecting comprehensive VRU behavior data, overcoming the limitations of traditional fixed-point observations. The analysis identifies critical spatio-temporal patterns in violation behaviors, with distinctive seasonal variations and clear afternoon peak periods. The research establishes strong correlations between built environment characteristics and violation behaviors, particularly within 150 m buffer zones, suggesting an optimal scale for urban intervention strategies. The findings highlight the disproportionate representation of e-bike users in violation behaviors and the prevalence of helmet non-compliance, pointing to specific areas requiring targeted intervention.

These results provide valuable insights for urban planners and policymakers, suggesting that effective VRU safety interventions should consider both temporal variations and spatial characteristics of the built environment. The study’s findings can inform the development of more targeted and effective safety measures, particularly in areas with high commercial density and complex road networks.

For risk factor identification in video data, previous researchers typically employed video coding and extraction methods to generate key behaivor indicators. However, this approach proved insufficient for our study due to the diverse range of risk behaviors and the challenges in programmatically capturing VRU movements. To overcome these limitations, we adopted a hybrid approach combining human expertise with machine learning. We manually annotated 16 types of VRU risk behaviors, labeling approximately 623 images and validating 367,076 images. This approach allowed us to identify risk factors at a fine spatial scale and analyze their spatial distribution patterns. Additionally, we employed deep learning methods to classify pedestrians by age group and gender.

The study encompasses most urban land use types, demonstrating strong representativeness. The methodology is readily replicable, enabling future crowdsourced data collection using the RMM approach to expand research scope and facilitate analyses of larger datasets. Looking ahead, future research should focus on expanding the geographical scope of this analysis and investigating the effectiveness of specific intervention strategies based on identified patterns. The development of advanced algorithms for automatic violation behavior detection represents another promising direction, potentially enabling more efficient and comprehensive monitoring of VRU behaviors. The established methodology also opens opportunities for crowdsourced data collection, enabling the development of more comprehensive datasets and deeper insights into VRU behavior patterns across diverse urban contexts.

This comprehensive approach to understanding VRU behavior patterns and risk factors provides a solid foundation for developing targeted urban safety interventions and informing future research that can improveg road safety for VRUs in urban environments.

Data Availability

The minimal dataset generated and analyzed during the current study, including anonymized violation counts, locations, and associated environmental variables, is available from the corresponding author on reasonable request.

Code availability

The custom Python scripts used for the analysis are available from the corresponding author on reasonable request.

References

WHO. Global status report on road safety: time for action (WHO, 2009).
Sosik-Filipiak, K. & Osypchuk, O. Identification of solutions for vulnerable road users safety in urban transport systems: grounded theory research. Sustainability 15, 10568 (2023).
Article ADS Google Scholar
Yang, T. et al. Characteristics of road traffic accident types and casualties in Guangzhou, China, from 2007 to 2020: a retrospective cohort study based on the general population. Heliyon 9, e12822 (2023).
Article PubMed PubMed Central Google Scholar
Darko, J. et al. Adaptive personalized routing for vulnerable road users. IET Intell. Transp. Syst. 16, 1011–1025 (2022).
Article Google Scholar
Vanlaar, W. et al. Fatal and serious injuries related to vulnerable road users in Canada. J. Saf. Res. 58, 67–77 (2016).
Article Google Scholar
WHO. Global Status Report on Road Safety 2023: Using data to drive action (WHO, 2023).
Commission, E. Road safety statistics 2020: what is behind the figures? (EU, 2020).
Olszewski, P., Szagała, P., Rabczenko, D. & Zielińska, A. Investigating safety of vulnerable road users in selected EU countries. J. Saf. Res. 68, 49–57 (2019).
Article Google Scholar
NHTSA. Early Estimates of Motor Vehicle Traffic Fatalities in 2022 (NHTSA, 2023).
Chen, Q., Chen, Y., Bostrom, O., Ma, Y. & Liu, E. A comparison study of car-to-pedestrian and car-to-E-bike accidents: data source: the China in-depth accident study (CIDAS) (No. 2014-01-0519). SAE Technical Paper (CIDAS, 2014).
Wang, C., Zhang, W., Feng, Z., Wang, K. & Gao, Y. Exploring factors influencing the risky cycling behaviors of young cyclists aged 15–24 years: a questionnaire-based study in China. Risk Anal. 40, 1554–1570 (2020).
Article PubMed Google Scholar
Hossain, M. M., Zhou, H., Sun, X., Hossain, A. & Das, S. Crashes involving distracted pedestrians: Identifying risk factors and their relationships to pedestrian severity levels and distraction modes. Accid. Anal. Prev. 194, 107359 (2024).
Article PubMed Google Scholar
Krishna, K. V., Kapruwan, R. & Choudhary, P. Understanding distracted pedestrians’ risky behaviour: the role of walking and visual characteristics through a field study. Transp. Res. Part F Traffic Psychol. Behav. 101, 111–129 (2024).
Article Google Scholar
Miao, M., Yang, Y. & Liang, Y. Pedestrian crash risk assessment and intervention. Adv. Mech. Eng. 8, 1687814016653296 (2016).
Article Google Scholar
Wang, C., Zhang, W., Feng, Z., Wang, K. & Gao, Y. Exploring factors influencing the risky cycling behaviors of young cyclists aged 15-24 years: a questionnaire-based study in China. Transp. Res. Part F Traffic Psychol. Behav. 78, 289–301 (2021).
Google Scholar
Liu, J., Jones, S., Adanu, E. K. & Li, X. Behavioral pathways in bicycle-motor vehicle crashes: from contributing factors, pre-crash actions, to injury severities. J. Saf. Res. 77, 229–240 (2021).
Article Google Scholar
Pyra, M. Safety of cyclists in poland in the context of European road traffic. Sustainability 16, 3682 (2024).
Article ADS Google Scholar
Zhang, X., Yang, Y., Ma, J., Chen, F. & Zhang, W. Understanding e-bike riders’ violation behaviors and crash risks: a study based on naturalistic riding data. Transp. Res. Part F Traffic Psychol. Behav. 93, 305–317 (2023).
Google Scholar
Huang, J., Song, Z., Xie, L., Lin, Z. & Li, L. Analysis of risky riding behavior characteristics of the related road traffic injuries of electric bicycle riders. Int. J. Environ. Res. Public Health 20, 5352 (2023).
Article PubMed PubMed Central Google Scholar
Ma, J. et al. Exploring the heterogeneous effects of riding behaviours and road conditions on delivery rider severities in scooter-style electric bicycle crashes involving vehicles. Int. J. Inj. Contr. Saf. Promot. 31, 165–180 (2024).
Article PubMed Google Scholar
Dodge, L. D. & Halladay, M. in 15th World Congress on Intelligent Transport Systems and ITS America’s 2008 Annual Meeting. Motorcycle Safety and Intelligent Transportation Systems (ITS AmericaERTICOITS JapanTransCore, 2008).
Davoodi, S. R. & Hossayni, S. M. Role of motorcycle running lights in reducing motorcycle crashes during daytime; a review of the current literature. Bull. Emerg. Trauma 3, 73 (2015).
PubMed PubMed Central Google Scholar
Lumba, P., Priyanto, S. & Muthohar, I. Prediction for probability of fatigue-related accident in motorcyclists. Appl. Sci. Technol. 1, 482–488 (2017).
Google Scholar
Mohd Shafie, S. A., Goh, W. C. & Leong, L. V. Evaluating the influence of positive affect, risk perception, and personal characteristics on the risk-taking riding behaviors of motorcyclists. Appl. Sci. 13, 1056 (2023).
Article CAS Google Scholar
Doan, H. T. N. & Hobday, M. B. Characteristics and severity of motorcycle crashes resulting in hospitalization in Ho Chi Minh City, Vietnam. Traffic Inj. Prev. 20, 732–737 (2019).
Article PubMed Google Scholar
Dastoorpoor, M., Idani, E., Khanjani, N., Goudarzi, G. & Bahrampour, A. Relationship between air pollution, weather, traffic, and traffic-related mortality. Trauma Mon. 21, e37585 (2016).
Hsu, C.-K. Reconsidering seasonality, weather, and road safety in non-temperate areas: the case of Kaohsiung, Taiwan. Travel Behav. Soc. 34, 100710 (2024).
Article Google Scholar
Timmermans, C. et al. Analysis of road traffic crashes in the State of Qatar. Int. J. Inj. Contr. Saf. Promot. 26, 242–250 (2019).
Article PubMed Google Scholar
Bi, H., Li, A., Zhu, H. & Ye, Z. Bicycle safety outside the crosswalks: investigating cyclists’ risky street-crossing behavior and its relationship with built environment. J. Transp. Geogr. 108, 103551 (2023).
Article Google Scholar
Fridstrøm, L., Ifver, J., Ingebrigtsen, S., Kulmala, R. & Thomsen, L. K. Measuring the contribution of randomness, exposure, weather, and daylight to the variation in road accident counts. Accid. Anal. Prev. 27, 1–20 (1995).
Article PubMed Google Scholar
Theofilatos, A. & Yannis, G. A review of the effect of traffic and weather characteristics on road safety. Accid. Anal. Prev. 72, 244–256 (2014).
Article PubMed Google Scholar
Liang, M. et al. Short-term effects of ambient temperature and road traffic accident injuries in Dalian, Northern China: a distributed lag non-linear analysis. Accid. Anal. Prev. 153, 106057 (2021).
Article PubMed Google Scholar
Malin, F., Norros, I. & Innamaa, S. Accident risk of road and weather conditions on different road types. Accid. Anal. Prev. 122, 181–188 (2019).
Article PubMed Google Scholar
Van den Bossche, F., Wets, G. & Brijs, T. A regression model with ARIMA errors to investigate the frequency and severity of road traffic accidents (No. RA-2004-35) (2004).
Mohamed, M. G., Saunier, N., Miranda-Moreno, L. F. & Ukkusuri, S. V. A clustering regression approach: a comprehensive injury severity analysis of pedestrian–vehicle crashes in New York, US and Montreal. Canada. Saf. Sci. 54, 27–37 (2013).
Article Google Scholar
Kim, S.-C. et al. Comparison of epidemiology and injury profile between vulnerable road users and motor vehicle occupants in road traffic fatalities. Traffic Inj. Prev. 20, 581–587 (2019).
Article PubMed Google Scholar
Sivasankaran, S. K. & Balasubramanian, V. Exploring the severity of bicycle–vehicle crashes using latent class clustering approach in India. J. Saf. Res. 72, 127–138 (2020).
Article Google Scholar
Merat, N., Louw, T., Madigan, R., Wilbrink, M. & Schieben, A. What externally presented information do VRUs require when interacting with fully Automated Road Transport Systems in shared space? Accid. Anal. Prev. 118, 244–252 (2018).
Article PubMed Google Scholar
Värnild, A., Tillgren, P. & Larm, P. Road users seriously injured in single crashes–The impact of sex, age and speed limit on injuries for pedestrians, cyclists, car occupants and motorcyclists in Sweden, 2016–2019. J. Transp. Health 33, 101717 (2023).
Article Google Scholar
Keler, A. et al. Data-driven scenario specification for AV–VRU interactions at urban roundabouts. Sustainability 13, 8281 (2021).
Article ADS Google Scholar
Eby, D. W. Naturalistic observational field techniques for traffic psychology research. in Handbook of Traffic Psychology 61-72 (Elsevier, 2011).
Hunter, L. Challenging the reported disadvantages of e-questionnaires and addressing methodological issues of online data collection. Nurse Res. 20, 11–20 (2012).
Li, D., Ranjitkar, P., Zhao, Y., Yi, H. & Rashidi, S. Analyzing pedestrian crash injury severity under different weather conditions. Traffic Inj. Prev. 18, 427–430 (2017).
Article PubMed Google Scholar
Ouni, F. & Belloumi, M. Spatio-temporal pattern of vulnerable road user’s collisions hot spots and related risk factors for injury severity in Tunisia. Transp. Res. Part F Traffic Psychol. Behav. 56, 477–495 (2018).
Article Google Scholar
Agerholm, N. & Andersen, C. S. Accident risk and factors regarding non-motorised road users-a central road safety challenge with deficient data. Lat. Am. J. Manag. Sustain. Dev. 2, 102–111 (2015).
Google Scholar
Olszewski, P., Osińska, B., Szagała, P., Skoczyński, P. & Zielińska, A. Problems with assessing safety of vulnerable road users based on traffic accident data. Arch. Civ. Eng. 4, 149–168 (2016).
Ptak, M. Method to assess and enhance vulnerable road user safety during impact loading. Appl. Sci. 9, 1000 (2019).
Article Google Scholar
Engel, S., Kratzsch, C., David, K., Warkow, D. & Holzknecht, M. in 6. Tagung Fahrerassistenzsysteme. Car2pedestrian positioning: methods for improving gps positioning in radio-based vru protection systems (2013).
Lim, Q., He, Y. & Tan, U.-X. In Proc. IEEE International Conference on Robotics and Biomimetics (ROBIO). Real-time forward collision warning system using nested Kalman filter for monocular camera (IEEE, 2018).
Vilaça, M., Macedo, E. & Coelho, M. C. A rare event modelling approach to assess injury severity risk of vulnerable road users. Safety 5, 29 (2019).
Article Google Scholar
Acerra, E. M. et al. The visual behaviour of the cyclist: comparison between simulated and real scenarios. Infrastructures 8, 92 (2023).
Article Google Scholar
Li, B., Yao, Q. & Wang, K. in Proc. 9th IEEE International Conference on Networking, Sensing and Control. A review on vision-based pedestrian detection in intelligent transportation systems (IEEE, 2012).
Dhondge, K., Song, S., Choi, B.-Y. & Park, H. In Proc. IEEE 79th Vehicular Technology Conference (VTC spring). WiFiHonk: smartphone-based beacon stuffed WiFi Car2X-communication system for vulnerable road user safety (IEEE, 2014).
Wang, P., Zhou, M. & Ding, Z. In Proc. IEEE International Conference on Information Communication and Software Engineering (ICICSE). A vru collision warning system with kalman-filter-based positioning accuracy improvement (IEEE, 2021).
Goldhammer, M. et al. Intentions of vulnerable road users—Detection and forecasting by means of machine learning. IEEE Trans. Intell. Transp. Syst. 21, 3035–3045 (2019).
Article Google Scholar
Chou, F.-C. et al. In Proc. IEEE Intelligent Vehicles Symposium (IV). Predicting motion of vulnerable road users using high-definition maps and efficient convnets (IEEE, 2020).
Merdrignac, P., Shagdar, O. & Nashashibi, F. Fusion of perception and V2P communication systems for the safety of vulnerable road users. IEEE Trans. Intell. Transp. Syst. 18, 1740–1751 (2016).
Article Google Scholar
Kraus, F., Scheiner, N., Ritter, W. & Dietmayer, K. In Proc. IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). Using machine learning to detect ghost images in automotive radar (IEEE, 2020).
Machin, M. A. & Sankey, K. S. Relationships between young drivers’ personality characteristics, risk perceptions, and driving behaviour. Accid. Anal. Prev. 40, 541–547 (2008).
Article PubMed Google Scholar
Alonso, F., Esteban, C., Useche, S. & Colomer, N. Effect of road safety education on road risky behaviors of Spanish children and adolescents: findings from a national study. Int. J. Environ. Res. Public Health 15, 2828 (2018).
Article PubMed PubMed Central Google Scholar
Ortiz Castelló, V., Salvador Igual, I., del Tejo Catalá, O. & Perez-Cortes, J.-C. High-profile vru detection on resource-constrained hardware using yolov3/v4 on bdd100k. J. Imaging 6, 142 (2020).
Article PubMed PubMed Central Google Scholar
Ribeiro, B., Nicolau, M. J. & Santos, A. Using machine learning on v2x communications data for vru collision prediction. Sensors 23, 1260 (2023).
Article ADS PubMed PubMed Central Google Scholar
Morris, A. P. et al. Autonomous vehicles and vulnerable road-users—Important considerations and requirements based on crash data from two countries. Behav. Sci. 11, 101 (2021).
Article PubMed PubMed Central Google Scholar
Wilbrink, M., Lau, M., Illgner, J., Schieben, A. & Oehl, M. Impact of external human–machine interface communication strategies of automated vehicles on pedestrians’ crossing decisions and behaviors in an urban environment. Sustainability 13, 8396 (2021).
Article ADS Google Scholar
Li, Y., Zhang, Y., Long, Y., Bhalla, K. & Ezzati, M. Assessing bicycle safety risks using emerging mobile sensing data. Travel Behav. Soc. 38, 100906 (2024).
Article Google Scholar
Zhang, Y. et al. Chasing the heat: Unraveling urban hyperlocal air temperature mapping with mobile sensing and machine learning. Sci. Total Environ. 927, 172168 (2024).
Article PubMed CAS Google Scholar
Institute, B. T. D. R. Beijing Transportation Development Annual Report 2023 (BTDR, 2023).
Statistics, B. M. B. O. Beijing Statistical Yearbook 2023 (BMBO, 2023).
Zhang, Y., Zhao, H., Li, Y., Long, Y. & Liang, W. Predicting highly dynamic traffic noise using rotating mobile monitoring and machine learning method. Environ. Res. 229, 115896 (2023).
Article PubMed CAS Google Scholar
Saravanan, D. & Joseph, D. in Microelectronics, Electromagnetics and Telecommunications: Proc. Fourth ICMEET 2018. Image data extraction using image similarities (ICMeet, 2019).
Russell, B. C., Torralba, A., Murphy, K. P. & Freeman, W. T. LabelMe: a database and web-based tool for image annotation. Int. J. Comput. Vis. 77, 157–173 (2008).
Article Google Scholar
Jocher, G., Chaurasia, A. & Qiu, J. Ultralytics YOLOv8. GitHub https://github.com/ultralytics/ultralytics (2023).
Kim, G. DarkLabel. GitHub https://github.com/darkpgmr/DarkLabel (2017).
Tournier, I., Dommes, A. & Cavallo, V. Review of safety and mobility issues among older pedestrians. Accid. Anal. Prev. 91, 24–36 (2016).
Article PubMed Google Scholar
Anselin, L. Local indicators of spatial association—LISA. Geogr. Anal. 27, 93–115 (1995).
Article Google Scholar
Ord, J. K. & Getis, A. Local spatial autocorrelation statistics: distributional issues and an application. Geogr. Anal. 27, 286–306 (1995).
Article Google Scholar
Anselin, L. & Bera, A. K. Spatial dependence in linear regression models with an introduction to spatial econometrics. Stat. Textb. Monogr. 155, 237–290 (1998).
Google Scholar
Rey, S. J. & Anselin, L. PySAL: a Python library of spatial analytical methods. Rev. Reg. Stud. 37, 5–27 (2007).
Google Scholar
Zhang, Y. et al. Powering the future: unraveling residential building characteristics for accurate prediction of Total electricity consumption during summer heat. Appl. Energy 376, 124146 (2024).
Article Google Scholar
Narváez, Y. V. et al. Road risk behaviors: Pedestrian experiences. Traffic Inj. Prev. 20, 303–307 (2019).
Article PubMed Google Scholar
Truong, L. T., Nguyen, H. T. & De Gruyter, C. Mobile phone use among motorcyclists and electric bike riders: a case study of Hanoi, Vietnam. Accid. Anal. Prev. 91, 208–215 (2016).
Article PubMed Google Scholar
Zhang, X. et al. Road traffic injuries among riders of electric bike/electric moped in southern China. Traffic Inj. Prev. 19, 417–422 (2018).
Article ADS PubMed Google Scholar
Zhao, N., Reimer, B., Mehler, B., D’Ambrosio, L. A. & Coughlin, J. F. Self-reported and observed risky driving behaviors among frequent and infrequent cell phone users. Accid. Anal. Prev. 61, 71–77 (2013).
Article PubMed Google Scholar
Guo, M., Zhao, X., Yao, Y., Bi, C. & Su, Y. Application of risky driving behavior in crash detection and analysis. Phys. A Stat. Mech. Its Appl. 591, 126808 (2022).
Article Google Scholar
Guo, M. et al. A study of freeway crash risk prediction and interpretation based on risky driving behavior and traffic flow data. Accid. Anal. Prev. 160, 106328 (2021).
Article PubMed Google Scholar
Zhang, Y., Yu, D., Li, D., Li, Y. & Zhang, J. Three in motion: a mobile study on the interlinked dynamics of CO2, air temperature, and PM2.5. J. Clean. Prod. 506, 145449 (2025).
Article CAS Google Scholar
Bai, L. & Sze, N. N. Red light running behavior of bicyclists in urban area: Effects of bicycle type and bicycle group size. Travel Behav. Soc. 21, 226–234 (2020).
Article Google Scholar
Ma, J., Shen, Z., Wang, N., Xiao, X. & Zhang, J. Developmental differences in children’s adaptation to vehicle distance and speed in street-crossing decision-making. J. Saf. Res. 88, 261–274 (2024).
Article Google Scholar
Pawar, D. S. & Yadav, A. K. Modelling the pedestrian dilemma zone at uncontrolled midblock sections. J. Saf. Res. 80, 87–96 (2022).
Article Google Scholar
Johnson, M., Newstead, S., Charlton, J. & Oxley, J. Riding through red lights: The rate, characteristics and risk factors of non-compliant urban commuter cyclists. Accid. Anal. Prev. 43, 323–328 (2011).
Article PubMed Google Scholar
Zhuang, X. & Wu, C. Pedestrians’ crossing behaviors and safety at unmarked roadway in China. Accid. Anal. Prev. 43, 1927–1936 (2011).
Article PubMed Google Scholar
Kummeneje, A. M., Ryeng, E. O. & Rundmo, T. Seasonal variation in risk perception and travel behaviour among cyclists in a Norwegian urban area. Accid. Anal. Prev. 124, 40–49 (2019).
Article PubMed Google Scholar
Sun, Z. et al. Exploring injury severity of vulnerable road user involved crashes across seasons: a hybrid method integrating random parameter logit model and Bayesian network. Saf. Sci. 150, 105682 (2022).
Article Google Scholar
Sheykhfard, A. & Haghighi, F. Assessment pedestrian crossing safety using vehicle-pedestrian interaction data through two different approaches: fixed videography (FV) vs In-Motion Videography (IMV). Accid. Anal. Prev. 144, 105661 (2020).
Article PubMed Google Scholar
Zaki, M. H., Sayed, T., Ismail, K. & Alrukaibi, F. Use of computer vision to identify pedestrians’ nonconforming behavior at urban intersections. Transp. Res. Rec. 2279, 54–64 (2012).
Article Google Scholar
Zaki, M. H. & Sayed, T. Automated analysis of pedestrians’ nonconforming behavior and data collection at an urban crossing. Transp. Res. Rec. 2443, 123–133 (2014).
Article Google Scholar
Gianfranchi, E., Spoto, A. & Tagliabue, M. Risk profiles in novice road users: relation between moped riding simulator performance, on-road aberrant behaviors and dangerous driving. Transp. Res. Part F Traffic Psychol. Behav. 49, 132–144 (2017).
Article Google Scholar
Torfs, K. & Meesmann, U. How do vulnerable road users look at road safety? International comparison based on ESRA data from 25 countries. Transp. Res. Part F Traffic Psychol. Behav. 63, 144–152 (2019).
Article Google Scholar
Rifaat, S. M., Tay, R. & De Barros, A. Effect of street pattern on the severity of crashes involving vulnerable road users. Accid. Anal. Prev. 43, 276–283 (2011).
Article PubMed Google Scholar
Zhang, Y. & Wu, C. The effects of sunshields on red light running behavior of cyclists and electric bike riders. Accid. Anal. Prev. 52, 210–218 (2013).
Article ADS PubMed Google Scholar
Cavacuiti, C. et al. Victims of road rage: a qualitative study of the experiences of motorists and vulnerable road users. Violence Vict. 28, 1068–1084 (2013).
Article PubMed Google Scholar
Ma, C., Yang, D., Zhou, J., Feng, Z. & Yuan, Q. Risk riding behaviors of urban e-bikes: a literature review. Int. J. Environ. Res. Public Health 16, 2308 (2019).
Article PubMed PubMed Central Google Scholar
Jancey, J. et al. Pedestrian and motorized mobility scooter safety of older people. Traffic Inj. Prev. 14, 647–653 (2013).
Article PubMed PubMed Central Google Scholar
Heinen, E. & Buehler, R. Bicycle parking: a systematic review of scientific literature on parking behaviour, parking preferences, and their influence on cycling and travel behaviour. Transp. Rev. 39, 630–656 (2019).
Article Google Scholar
Arun, A., Haque, M. M., Washington, S., Sayed, T. & Mannering, F. A systematic review of traffic conflict-based safety measures with a focus on application context. Anal. Methods Accid. Res. 32, 100185 (2021).
Google Scholar

Download references

Acknowledgements

This research was funded by China University of Geosciences (Beijing) (Grant No.: 2-9-2024-075) and the Youth Research Special Project of North China University of Technology (Grant No.: 2025NCUTYRSP027). The authors also gratefully acknowledge the valuable guidance provided by the ISPRS Working Group IV/8, “Digital Twins for Mobility and Navigation”, which is dedicated to developing mobility and navigation solutions for disabled and aging populations and promoting green, resilient transit in sustainable cities.

Author information

Authors and Affiliations

School of Land Science and Technology, China University of Geosciences (Beijing), Beijing, China
Yan Li & Zhizhong Kang
School of Architecture and Art, North China University of Technology, Beijing, China
Pengcheng Du, Hongjin Ren, Teng Xi, Ke Gao & Yuyang Zhang
Department of Infrastructure Engineering, University of Melbourne, Melbourne, VIC, Australia
Majid Sarvi

Authors

Yan Li
View author publications
Search author on:PubMed Google Scholar
Pengcheng Du
View author publications
Search author on:PubMed Google Scholar
Hongjin Ren
View author publications
Search author on:PubMed Google Scholar
Teng Xi
View author publications
Search author on:PubMed Google Scholar
Ke Gao
View author publications
Search author on:PubMed Google Scholar
Zhizhong Kang
View author publications
Search author on:PubMed Google Scholar
Majid Sarvi
View author publications
Search author on:PubMed Google Scholar
Yuyang Zhang
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.L., Z.K., and M.S. conceived and designed the research project through China-Australia collaboration. Y.L. and Y.Z. developed the analytical framework and methodology. P.D., H.R., T.X., and K.G. conducted data collection and processing. Y.L. and Y.Z. wrote the main manuscript text. P.D. performed machine learning model development and validation. H.R. and T.X. prepared the figures and conducted spatial analysis. K.G. contributed to statistical analysis and data visualization. Y.Z. Z.K., and M.S. provided supervision and critical revision of the manuscript. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Yuyang Zhang.

Ethics declarations

Competing interests

M.S. is an Editorial Board Member of npj Sustainable Mobility and Transport but is not involved in the peer-review process or decision making for this manuscript. All other authors declare no conflict of interest.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Du, P., Ren, H. et al. Mobile sensing discovery of when where and why vulnerable road users break traffic rules. npj. Sustain. Mobil. Transp. 3, 1 (2026). https://doi.org/10.1038/s44333-025-00068-y

Download citation

Received: 10 June 2025
Accepted: 14 November 2025
Published: 07 January 2026
Version of record: 07 January 2026
DOI: https://doi.org/10.1038/s44333-025-00068-y

This article is cited by

E-bikes enhance climate resilient urban mobility through heat adaptation as revealed by mediation analysis
- Yuyang Zhang
- Pengcheng Du
- Yan Li
Communications Earth & Environment (2026)

Subjects

Abstract

Similar content being viewed by others

Characteristics, likelihood and challenges of road traffic injuries in China before COVID-19 and in the postpandemic era

Influence of on-road mobile monitoring design on ultrafine particle exposure models and cognitive health inferences

VRUCrossSafe for crossing intention prediction of vulnerable road users for improving safe crossing at intersections

Introduction

Methods

Mobile panoramic monitoring campaigns

Study area

VRUs and violation behaviors

Mobile data collection campaigns

Data pre-processing

Identification of VRUs

Identification of violation behaviors

Identification of population structure

Multi-scale statistical methods

Machine Learning for Environmental Factor Analysis

Results

Profile of violators and violations

Spatial distribution patterns

Environmental Influencing factors on behaviors

Discussions

Conclusions

Data Availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary information (download PDF )

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

E-bikes enhance climate resilient urban mobility through heat adaptation as revealed by mediation analysis

Search

Quick links