Computer vision and statistical insights into cycling near miss dynamics

Ibrahim, Mohamed

doi:10.1038/s41598-024-70733-8

Download PDF

Article
Open access
Published: 10 September 2024

Computer vision and statistical insights into cycling near miss dynamics

Mohamed Ibrahim^1,2

Scientific Reports volume 14, Article number: 21151 (2024) Cite this article

2331 Accesses
3 Citations
Metrics details

Subjects

Abstract

Across the globe, many transport bodies are advocating for increased cycling due to its health and environmental benefits. Yet, the real and perceived dangers of urban cycling remain obstacles. While serious injuries and fatalities in cycling are infrequent, “near misses”-events where a person on a bike is forced to avoid a potential crash or is unsettled by a close vehicle-are more prevalent. To understand these occurrences, researchers have turned to naturalistic studies, attaching various sensors like video cameras to bikes or cyclists. This sensor data holds the potential to unravel the risks cyclists face. Still, the sheer amount of video data often demands manual processing, limiting the scope of such studies. In this paper, we unveil a cutting-edge computer vision framework tailored for automated near-miss video analysis and for detecting various associated risk factors. Additionally, the framework can understand the statistical significance of various risk factors, providing a comprehensive understanding of the issues faced by cyclists. We shed light on the pronounced effects of factors like glare, vehicle and pedestrian presence, examining their roles in near misses through Granger causality with varied time lags. This framework enables the automated detection of multiple factors and understanding their significant weight, thus enhancing the efficiency and scope of naturalistic cycling studies. As future work, this research opens the possibility of integrating this AI framework into edge sensors through embedded AI, enabling real-time analysis.

The many paths ahead: toward an interdisciplinary framework for Critical Cycling Studies

Article Open access 14 October 2025

The nonlinear relationship between built environment and cycling propensity for different travel purposes − based on extreme gradient boosting decision tree

Article Open access 17 July 2025

CYCLANDS: Cycling geo-located accidents, their details and severities

Article Open access 26 May 2022

Introduction

Unsafe mobility interactions, either with other road users or infrastructures, are key contributors to unsafe behaviours in cities^1,2. There has been a surge in the popularity of cycling, both in Europe and globally³. The potential health benefits of cycling and its role in reducing environmental pollution have driven planners and policymakers to invest in bike infrastructure, be it for daily commutes or leisure activities^4,5,6,7. To boost cycling, a variety of policies, initiatives, and both tangible and intangible interventions have been introduced across the world^6,8. In the UK, for instance, Transport for London (TfL) has backed numerous cycling projects, such as cycling superhighways, quiet routes, mini-Hollands, and bike rental schemes to ensure a safer environment for cyclists⁹. However, even with the notable health benefits outweighing the risks of cycling⁴, the risk for people on bikes remains high¹⁰.

Near miss experiences can shape the perception of cycling as a dangerous activity^{1,2, 11, 12}. In the UK, cyclists are estimated to have at least one close pass every six miles they commute¹³. Concerns about collisions or falls deter many from embracing cycling as a means of transportation^{1,14, 15}. A UK study involving 244 cyclists and non-cyclists found that a perceived lack of safety was a primary barrier for many against cycling¹⁶. The apprehension around cycling risks and past close calls have been identified as significant obstacles to wider cycling adoption¹⁷. These near misses can be attributed to various traffic-related anxieties, including distracted drivers, vehicles passing too closely, door-related incidents, speeding vehicles, aggressive drivers, or being abruptly cut off by a turning vehicle¹⁷.

Cycling near misses are transport-related, but the factors that contribute to them may or may not be connected to transport^{1,14, 18,19,20}, therefore it’s important to look at the bigger picture. These factors can be related to aspects such as visibility, physical conditions of the built-up areas, interaction among different road users, or behavioural and psychological factors related to the cyclist, in which there is a clear knowledge gap in extracting these factors automatically². When taken as a whole, cycling near misses can be viewed as a type of urban system that evolves in cities as a result of various circumstances and events that might or might not be directly tied to transportation. For interpreting cycling near misses, it is essential to have a solid understanding of the many urban systems that exist in cities as well as their dynamics. Understanding cities and subsequently human mobility through computer vision has shown substantial progress in the last few years²¹.

Artificial Intelligence (AI), specifically, deep learning has significantly advanced our understanding of urban mobility and the dynamics of city life, offering powerful tools to analyse and predict patterns within complex urban environments^2,21. By utilising large datasets, deep learning models can uncover complex relationships and behaviours in domains such as traffic flow, pedestrian movement, and accident and near accident experiences. A key aspect of applying deep learning to urban mobility involves feature engineering from video streams, such as using optical flow. Optical flow techniques analyse the motion of objects between consecutive frames in video streams, enabling the extraction of meaningful data regarding speed, direction, and density of movement. This approach has been utilised in understanding cycling near misses²². In this research, we extend the utility of artificial intelligence in inferring and analysing human risks by introducing a first-hand computer vision and statistical analysis framework that is able to assist city planners and policymakers to detect and analyse cycling near misses and their risk factors. Based on machine intelligence, the tool will automatically analyse cycling near misses from video streams by understanding the interaction between people, the built environment, and the different transport modes. The research will have several benefits in terms of improving road safety. This knowledge of risk factors will enable: (a) individuals (cyclists and other road users) to change their behaviour to minimise risk, (b) transport authorities to plan safer infrastructure and run informed awareness campaigns, and (c) the production of more accurate risk maps, showing which routes are safest for cycling, and what types of incidents to be wary of.

The relevance of this study is substantial in the evolving field of urban cycling safety. By integrating advanced computer vision and statistical analysis into a single framework, this research significantly advances existing methodologies that rely heavily on manual data processing and interpretation. It goes beyond a single AI model. Moreover, traditional approaches to studying cycling near misses are labour-intensive and often suffer from limited scalability due to the sheer volume of video data generated. This new method not only automates the detection and analysis of near misses from video streams but also introduces the ability to systematically identify and quantify risk factors in real-time.

Results

Integrated framework

There is no doubt that advances in computer science in general, or geo-computational methods have led to several advances in geography and understanding urban systems. Seeing cities from the street view adds more dimensions of information and complexity²¹. Capturing these rapid urban changes in day-to-day life through images offers more opportunities to tackle urban dynamics towards a better understanding of cities. We introduced different methods to understand critical events such as cycling near misses and their risk factors. Figure 1 shows the overall scope of our introduced vision framework. This framework can be utilised as a base for generating urban data for multi-purpose urban and transport-related studies. The framework can capture information related to environmental, visual, and built environment conditions coupled with the spatiotemporal context. The framework operates in real-time, achieving 26 frames per second on a single RTX 2080 GPU. The innovation can be seen in detecting critical events and understanding their causes. Also, by applying the same algorithms to active cameras in cities, the model can enable real-time capturing of data. Last, the same methodology can be applied to tackle and classify different urban issues from urban scenes. Coupled with remote sensing image classification methods, the proposed framework can reveal deeper insights into the dynamics of cities. Furthermore, the integration of embedded AI with edge sensors facilitates real-time analytics and data extraction, enabling prompt responses to ongoing events and conditions in urban settings. Beyond cycling near-miss events, this framework can be adapted to detect and analyse various urban phenomena, such as traffic congestion, pedestrian safety, and environmental monitoring.

Framework stringency index (SI)

Even though each model presented in this research is validated individually, we introduced a Framework Stringency Index (SI) to further evaluate the performance of the individual models for the given task of analysing cycling near misses. What makes this index unique is that it does not only include the performance of each model, but it also takes into account its importance in understanding cycling near misses in terms of the weighting of the variables it generates in the regression models.

Table 1 The summary of the results of the introduced framework.

Full size table

Table 1 shows the combined results of the individual models. These results represent the average precision of each model and their absolute and normalised statistical weights. After computing SI based on the presented models’ results, the overall performance of the pipeline achieved a SI of 0.81. The closer the SI value to 1, the more accurate the framework is in detecting the different risk factors in accordance with the different precision of the deep models and the weight of a given factor on the occurrence of near misses. Based on the results of the normalised weights, it is worth mentioning that the SI index is highly influenced by the precision of scenes that belong to clear and rainy weather and those that include people and bikes. Nevertheless, it is less influenced by the precision of scenes that include trucks, glare, and wet surfaces.

Association between variables

We have used the Product Moment Correlation Coefficient (PMCC) to highlight the linear correlation in the data set. The PMCC measures the correlation between two variables in the range $[-1,1]$, where 1 represents a perfect positive correlation, $-1$ represents a perfect negative correlation and 0 represents no correlation. For further explanation of the PMCC, see²³. Figure 2 shows the PMCC between each pair of variables. It indicates a different positive and negative correlation, in which some of them can be considered as new findings, whereas others can be seen as logical and expected outcomes. For instance, daytime is inversely correlated with night-time, and clear weather is inversely correlated with rainy weather, which is logical and expected. Similarly, the presence of people is positively correlated with daytime, clear weather, and the presence of bicycles. The presence of glare is positively correlated with night-time, rain, and fog. While glare is usually associated with sunny conditions, the detected glare in this dataset is due to headlights in darker conditions such as rain and fog. Wet ground is positively correlated with rain, snow, and night-time. On the other hand, a crucial finding is that near misses are positively correlated with rain but uncorrelated with daytime. Furthermore, while there is a positive correlation between the presence of a cycling lane and the presence of people and bicycles, there is an absence of correlation between the presence of a cycling lane and the occurrence of near misses. It may seem counter-intuitive that dawn/dusk and daytime are not perfectly negatively correlated, so it is worth mentioning that this is because there are three mutually exclusive classes (Dawn/dusk-time, daytime, and night-time).

As a step forward to further investigate the collinearity in the data set among the different variables, we used a t-test to highlight the significant differences between the near miss and safe scenes in terms of the selected variable. The t-test method is used to compare the means of the continuous variables for both groups, safe and near misses. When the p-value is less than 0.05, the null hypothesis can be rejected, and the results of selected variables can be deemed statistically significant. It can be used to differentiate between safe and near miss scenes. For further explanation regarding t-test analysis, see^24,25. Table 2 shows the statistically significant results of the t-test method for five significant independent variables, in which the near miss variable is treated as a dependent variable. The results show that the occurrence of near misses is statistically significant with the counts of cars, buses, and motorbikes with positive coefficient values and statistical significance with the counts of people and bicycles with negative coefficient values.

Table 2 The significant results of the t-test method.

Full size table

The impacts of risk factors in cycling near misses

We aim to directly grasp the influence of various independent variables on the occurrence of near misses in non-controlled experiments. This stage serves as a foundational model for subsequent sections where we delve deeper into specific variables that have demonstrated a statistically significant relationship with near misses.

There are 13,145 frames categorized as near misses and 33,422 frames identified as safe cases. To achieve a more balanced representation of the dependent variable, we initiated a new experiment to address the class imbalance. This involved drawing a random sample of 13,145 from the 33,422 safe case frames. Using a Logistic regression model, we assessed the collinearity between the near miss variable (dependent) and other independent variables, without considering any confounder assumptions or controlled variables. For a comprehensive understanding of logistic models and utility functions, refer to²⁶ and²⁷. Table 3 summarises the statistics of the Logistic regression model. There are several independent variables are identified as statistically significant, with various variable coefficients and standard errors. This model shows a statistically insignificant intercept. The table shows the coefficients of all variables and their statistical significance with the occurrence of near misses in presence of a cycle lane.

Table 3 The results of the logistic regression model (balanced classes of near misses).

Full size table

Granger causality of risk factors

Understanding the temporal causal structure of a given dataset is essential for interventions and decision-making for real-world applications²⁸. For a given risk factor to cause a near miss, it has to precede its occurrence. If the time lag between the risk factor being observed and the near miss occurring can be modelled, then it has the potential to be used in an early warning system. Figure 3 shows the assumption of temporal causality, highlighting the scope that defines causality. To test for granger-causality, the figure shows that the tested variable must be in a sequential form and there is a defined lag between the selected variable and a near miss for causality to be significant. To compute Granger causality for the different independent variables (16 variables), different experiments have been conducted for selecting a lag value. We experimented with values in the range of 1 to 120. This selection is made based on (1) trial and error and (2) the nature of the data set used, in which 30 data points represent 1 s. The results show statistically significant outcomes for three independent variables (car, person, and glare), which means that the count of cars, persons or the occurrence of glare Granger-causes near misses for different lag intervals. In other words, these variables could be useful for forecasting the occurrence of cycling near misses.

Table 4 shows the result of the Granger causality for these three variables. Firstly, regarding the car variable, the results show significant Chi-squared and F-test values at a p-value less than 0.05 for a lag value that is 17 or lower (below 0.5 s). Besides the significant causality, this could also indicate the short-term effect or the rapid effect of the presence of a car in Granger-causing the occurrence of near misses. Secondly, regarding the person variable, similar to the car variable, the results show a statistically significant chi-squared test and f-test for various lags of p-value below 0.05. However, unlike the car variable, the causal effect of the person variable has a long-term effect in which the lag values range from 18 to 42 (approx. 1.5 s). Lastly, regarding the glare variable, similar to the two aforementioned variables, the occurrence of glare shows a statistically significant chi-squared test and f-test at different lags. Unlike these two variables, however, the causal effect of glare on the occurrence of near misses remains significant in both the short-term (0.5 s) and long-term (2 s). This could indicate how crucial the existence of glare is to the occurrence of cycling near misses.

Table 4 The significant results of the Granger causality models.

Full size table

Discussion

This research introduces an overall methodology of different deep and mathematical models in an integrated pipeline. The general goals of this framework are to detect critical issues in cities, such as cycling near misses while extracting their risk factors and their effect on these critical events. It introduces a framework stringency index that aims to evaluate the overall methodology, in addition to the evaluation metrics conducted on the individual methods and models. The importance of this index can be highlighted in evaluating the weights and the importance of the individual models in their function and utility in the overall methodology. Nevertheless, the number of outputs that each sub-method contributes to the overall methodology. Last, the research also highlights the importance of the flexibility of the introduced pipeline that could allow and cope with any future adaptation, either in terms of refining methods, or introducing new ones. After analysing a generated dataset of a large sample (N= 46,567), we analysed the impacts of several risk factors on cycling near misses, which we focus here on linking these findings to the literature, highlighting future work.

Linking results to literature

When it comes to the physical state of the built environment, it has been observed that in the presence of a cycle lane, drivers may travel inside their designated lane with less regard for ensuring a safe passing distance for cyclists in the adjacent cycle lane²⁹. A recent investigation on close pass events corroborated similar outcomes³⁰. However, we found that, based on the statistical significance of the regression model, near misses are less likely to occur in the presence of a cycle lane in comparison to its absence. When we examined this, we discovered that both weather and surface conditions had statistical relevance. These findings are in line with the literature. For example, several studies include surface condition factors such as wet, dry, well-maintained, or deteriorating surfaces^{1,31,32,33,34}. It was discovered that some near misses occurred at the icy surface³⁵. It was also revealed that the majority of the close misses occurred on dry surfaces³³. These frequencies, however, are based on the count of responses rather than the relevance of the finding, which could be related to the self-selection of travel routes or duration. In our study, we looked at how often road users are involved in near misses, finding their statistical significance. These findings fill another gap in the literature by directly investigating the impact of the surrounding context on the flow of traffic for bicycles, pedestrians, and vehicles in a specific area in cities, potentially exposing cyclists to risk^2,36. It has been demonstrated that a lack of exposure data makes it difficult to draw meaningful conclusions^2,36. They also stated that because this data frequently overlooks minor accidents, near miss events are more likely to be missed as well, making it impossible to assess safety standards between different types of infrastructure. Last, even though the time of day may be a substantial risk factor in cycling accidents^2,37, many studies completely ignore the problem of time^{1,11, 17, 38,39,40,41,42,43,44}. Other studies categorise visual conditions as either day or night, without taking into account more complex effects as those brought on by direct sunlight (i.e., glare) at dawn and dusk, which our findings aimed to contribute to this knowledge gap.

Real-world applications

This study’s innovative approach to analysing cycling near misses has several far-reaching applications that could transform urban cycling safety. Primarily, by providing real-time analysis, the framework enables immediate responses to risky situations, which can help city planners and traffic authorities to swiftly implement safety measures and adjust traffic regulations as needed in a given location. This proactive approach can significantly reduce the incidence of cycling-related accidents and support the expansion of cycling as a commuting mode. Moreover, the detailed insights gained from the study allow for more informed decisions in urban planning. By identifying specific risk factors and risky areas, planners can design cycling paths and urban layouts that minimise these risks, potentially including features like protected bike lanes and improved lighting at critical intersections. In addition to infrastructure planning, the framework supports the creation of dynamic and informative awareness campaigns. These campaigns can use data from the study to highlight specific behaviours that lead to near misses and advocate for best practices among cyclists and motorists alike, such as the use of proper signalling and the importance of maintaining a safe following distance. Another key application of this study is the development of advanced risk mapping. Such maps not only show safer routes for cyclists but also integrate real-time data to update risk levels based on factors like time of day, weather conditions, and traffic volume. This can empower cyclists with the information needed to make safer travel decisions. Finally, the framework developed in this study could also serve as a model for other modes of transportation (i.e. motorcycles), where similar near-miss analysis frameworks could be implemented to enhance overall traffic safety and efficiency. By extending these methodologies beyond cycling, the potential benefits of this research could contribute broadly to smarter, safer urban mobility solutions.

Limitations

This research presented new approaches and outcomes for understanding the contributions of risk factors such as the counts of road users, visual, weather or surface conditions to the occurrence of cycling near misses. However, there are still limitations that need to be addressed in future work when it comes to assessing the cause and effect of the stated subject. First, data representation and distribution: Finding observation points that represent various types of events and conditions in the scope of the stated subject remains a critical issue for understanding and generalising the measured causes and effects. In this vein, for future studies, a naturalistic study needs to be carried out to include a representative sample of data that belongs to different types of near misses, and different visual, weather, and physical conditions. Second, addressing the behavioural represents another limitation. Similar to addressing the issue of representative data in terms of scene types and conditions, the representation of strata that belongs to different socioeconomic structures needs to be considered.

Methods and materials

For cycling near misses video streams, we utilised a subset from the dataset provided by²². The dataset contains 46,567 sequential frames, representing 209 unique near miss videos with an average duration of 1.3 s. The dataset includes a refined selection of video clips that capture a broad range of environmental, temporal, and visual contexts for urban cycling scenarios. This study seeks to expand upon previous research by exploring the dynamics and risk factors associated with cycling near misses, going beyond mere detection of these events as outlined in the previous study²². The dataset focuses on the characteristics of the near misses, such as timing, environmental conditions, and interactions with various road users. By using sophisticated statistical tools and machine learning algorithms, the research identifies patterns and trends that could inform safer urban planning and cycling infrastructure. Moreover, the paper evaluates the effectiveness of existing models in detecting and classifying different types of near-miss events and suggests improvements based on the insights gained from the secondary dataset.

Proposed framework

There are different approaches for integrating different tasks which depend on the availability of multi-label data, the ability of fusing data of different input parameters, or the availability of computational resources. Multitask and ensemble learning are two crucial approaches for learning multiple tasks⁴⁵. Multitask learning refers to the simultaneous training of several tasks of the same input, in which tasks can share intermediate-level representation in some shared layers. This approach aims to improve generalisation by pooling the examples outputted by several tasks⁴⁵. On the other hand, ensemble learning refers to combining multiple models to solve a given problem. There are different purposes of ensemble learning, most commonly, the bootstrap aggregating (or bagging) technique⁴⁵. In this approach, several models are trained differently for a given task and combined to reduce generalisation errors. Ensemble learning, however, is also used for other purposes such as data fusion or incremental learning⁴⁶. Certain problems can be too difficult for a given classifier to solve or too computationally expensive to conduct, in which case the divide-and-conquer approach can be utilised through incremental learning. Accordingly, ensemble learning seems suited to the diversity of computational tasks required to recognise cycling near misses and their risk factors. Different tasks can be learned by the representation of the input incrementally. This approach will allow flexibility in how the input data can be used and organised for each given task and minimise the computational requirements of training several models for various tasks at once. It would also allow modification and further development, at a later stage, of any given model without affecting the other assembled tasks.

The introduced framework is built based on ensemble learning with a single input of video streams. The framework outputs four outcomes: (1) critical event detection (in this case, near misses), (2) a list of detected risk factors and objects, and last (3) causal inference for the detected factors on the detected critical event. The pipeline is fully coded in Python programming. After training, testing, and validation, the pre-trained deep learning models are utilised for analysing future scenes as a pragmatic computer vision tool. For the objectives of this research, there are multiple advantages to selecting ensemble learning. During the training phase, this approach allows various tasks to be trained separately based on their input and computational requirements or the availability of data that might not be possible with other approaches such as Multitask learning. At inference, it allows the single input to be treated differently throughout the pipeline as either single-frame images or sequential images based on the specific tasks. In the post-production phase, ensemble learning allows the pipeline of the framework to be modified or expanded for a given task without affecting the other models in the pipeline.

Figure 4 shows the overall workflow of the proposed pipeline when a video stream is received as input. First, phases I and II extract risk factors and agents (pedestrians, cycles, vehicles etc.), respectively, while phase III detects instant actions (near misses) in parallel. The outputs of all preceding phases are then fed into phase IV, where causal inference is performed. The four phases are described in detail in the following subsections:

Phase I: Extracting risk factors from the environment

This phase tackles the different factors related to the environment that may influence the safety of the cyclist. Alongside image classification, understanding the overall gist of a scene is crucial for understanding the built environment⁴⁷ and few studies have been done in this area. For instance, sensing the qualitative measures that are related to the built environment that may contribute to near misses, such as road infrastructure, lighting and weather conditions. Weather and visual conditions are often addressed individually. WeatherNet⁴⁸ introduces a novel framework to automatically extract this information from street-level images relying on deep learning and computer vision using a unified method without any pre-defined constraints in the processed images (i.e., pre-determined field of view, angle, positioning, or cropping). The WeatherNet model comprises four ResnNet models to extract various weather and visual conditions such as Dawn/dusk, day and night for the time of day; glare for lighting conditions; and clear, rainy, snowy, and foggy for weather conditions. Moreover, wet road conditions, combined with other factors related to visibility, weather and/or physical conditions may contribute to many risky situations and instant events when it comes to mobility in a complex environment. Whether driving, cycling, or even walking, a wet surface may cause potential near misses, or serious incidents. The classification of the road is often interpreted based on the perceived weather and precipitation conditions. However there may be cases where the ground is wet enough to cause a critical event while the sun is shining, and conversely, there may be rainy days where the ground is not yet wet. To tackle this subtle issue, we trained a model similar to WeatherNet to detect the whether a given road is wet or dry. Additionally, we trained a model to detect the presence of a cycling lane. We followed the same training implementations suggested by⁴⁸.

Phase II: Detecting and tracking objects

We introduce simultaneous object detection and tracking of road users to the overall framework. The phase consists of two main models: (1) Object detection and (2) multi-object tracking. To detect road users (i.e. people, cars, trucks, buses, motorcycles, and bicycles) from extracted scenes, we used You Only Look Once (YOLO) V5 method⁴⁹, trained on COCO dataset⁵⁰. After object detection, we utilisied Deep Simple Online and Realtime Tracking (DeepSORT) method⁵¹. The SORT method is suitable for online object-tracking because (1) Its speed allows fast computation without a huge drop in Frames Per Second (FPS), (2) it relies on simple techniques such as Kalman Filter, which makes it easy to implement online without previous training. In the case of tacking, the SORT method is evaluated on Multi-Object Tracking (MOT) benchmark datasets⁵².

Phase III: Detecting instant actions

In this phase, we utilise a new method called CyclingNet²² for detecting cycling near misses from video streams generated by a mounted frontal camera on a bike. CyclingNet is a deep computer vision model based on a convolutional structure embedded with Self-attention Bidirectional Long-short Term Memory (LSTM) blocks that aim to understand near misses from both sequential images of scenes and their optical flows. The model is trained on scenes of both safe rides and near misses. Action recognition, relying on the CyclingNet model. For further details of how the model is developed and trained, see²².

Phase IV: Causal inference

We aim, after precisely extracting a combination of risk factors, to understand: (1) the cause and effect of individual risk factors on the detected events, (2) the causality of these risk factors in the detected events in a time series. Accordingly, this phase relies on statistical modelling techniques to uncover the causes and the effects of each extracted factor on the detected events. It includes two types of analysis, which will be covered.

Logistic regression

We use the detected variable corresponding to critical events as the dependent variable, with detected objects and risk factors being independent or control variables in a logistic regression model. The utility function of the near miss category $i$ in the occurrence of $j$ is computed as:

$$\begin{aligned} v_{ij} = \varepsilon + \sum _{k \in T} b_k x_{ijk} \end{aligned}$$

(1)

where $x_{ijk}$ represents the attribute $k$ for point $j$ on near miss occurrence of $i$, $b_k$ is a coefficient in the utility function, $T$ represents the set of attributes, $\varepsilon$ represents the stochastic part of the utility function.

The coefficient of the model is computed by estimating the maximum likelihood, whereas the stochastic part $\varepsilon$ is computed by assuming it as a double exponential distribution. The logarithm of the likelihood of the model of the actual occurrence of near misses can be expressed as:

$$\begin{aligned} \log L = \sum _{i=1}^{N} \sum _{j=1}^{J} Y_{ij} \ln P_i(Y=j|x,\beta ) \end{aligned}$$

(2)

where:

$$\begin{aligned} P_i(Y=j|x,\beta ) = \frac{\exp (v_{ij}(x_{ij,b}))}{\sum _{h=1}^{j} \exp (v_{ih}(x_{ih,b}))} \end{aligned}$$

(3)

where $Y$ is the binary dependent variable, $X$ represents the independent variables, $v_{ij}$ is the utility function for the $j$-th alternative of $i$-th choice, $N$ represents the occasion of choices, $j$ represents the number of alternatives, $P_i$ represents the predicted probability of the occasion of $i$ occurrence of a near miss, and $\beta$ represents the parameter vector of the model.

We also include different parametric and non-parametric tests (i.e. t-test) to determine the strength and significance of the results.

Granger causality

Granger causality is a probabilistic method for investigating the causality between two variables in a time series dataset. Unlike understanding the general cause and effect of the individual factors, causality, or the ‘Granger-cause’, focuses on highlighting when a particular variable comes before another in time series data. The Granger causality method is employed^28,53. Granger causality is a statistical approach used to determine whether a given time series could be useful in predicting another one. The main hypothesis is that if a time-series $X_1$ Granger-causes a time-series $X_2$, then the past values of $X_1$ should contain information that assists in predicting $X_2$. To avoid the post hoc fallacy (Given that an event x is followed by an event y, event x must have caused event y), Granger causality aims only to find predictive causality, whereas true causality is rather a philosophical argument. Given that the variables are extracted from sequential frames, each variable can be seen as a time series and this approach can be useful to determine whether any risk factor Granger-causes near misses based on the time lag between the occurrence of a near miss, and the preceding existence of a given risk factor. To compute Granger causality, the variables have been transformed to stationary series, ensuring that the data distribution (mean, variance, and autocorrelation) of the variables do not change over time.

Framework stringency index

The performance of each model introduced in the pipeline of the overall methodology is evaluated with different metrics and loss functions depending on the types and scopes of the given task of the model. Nevertheless, the models not only vary based on their evaluation metrics but also the resulting accuracies and precisions. On the other hand, as shown in Fig. 4, The relations between these different models vary. For instance, some models function consecutively, while others function in parallel to other phases. The goal of this research is to provide a stable framework to be used as a computer vision tool for the detection of near misses, risk factors, and their effects on near misses. This makes it a challenge for a pipeline of mixed models and different ensemble techniques to be evaluated as a whole. Traditionally, the performance can be measured based on the sum of the losses of each model, when models are evaluated similarly, and hold the same weights of utilisation in the entire pipeline. Given that we aim to develop a verified pipeline of the different pre-trained models, we introduce a new stringency index to indicate the performance of the entire framework on a given input that can draw a conclusion based on three aspects: (1) the individual loss of each model, (2) the number of outputs of each model, and 3) the weight of the utilisation of each model in the framework. The framework Stringency Index (SI) is defined as:

$$\begin{aligned} SI = \sum _{i=1}^{t} \sum _{o=1}^{n} \sum _{i=1}^{j} \left( \frac{\beta _i \cdot P_i}{t} \right) \end{aligned}$$

(4)

where $j$ denotes the number of outputs per model, $n$ denotes the number of models in the framework, $t$ represents the total number of sequential frames, $P$ represents the estimated average precision between the predicted and actual value for a single output $o$ of a given model $i$, and $\beta$ represents the normalised statistical weight of a given risk factor on a given task of the detection of a near miss.

Remarks

The general goals of the introduced framework are to detect critical issues in cities, such as cycling near misses, while extracting their risk factors and their effect on these critical events. This paper introduces a framework stringency index that aims to evaluate the overall methodology, in addition to the evaluation metrics conducted on the individual methods and models. The importance of this index can be highlighted in evaluating the weights and the importance of the individual models in their function and utility in the overall methodology. Nevertheless, the number of outputs that each sub-method contributes to the overall methodology. Last, the paper also highlights the importance of the flexibility of the introduced pipeline that could allow and cope with any future adaptation, either in terms of refining methods, or introducing new ones.

Data availability

The raw video dataset is provided by²², all processed data is available upon request from the corresponding author on reasonable request.

References

Aldred, R. Cycling near misses: Their frequency, impact, and prevention. Transp. Res. Part Policy Pract. 90, 69–83. https://doi.org/10.1016/j.tra.2016.04.016 (2016).
Article Google Scholar
Ibrahim, M. R., Haworth, J., Christie, N., Cheng, T. & Hailes, S. Cycling near misses: A review of the current methods, challenges and the potential of an AI-embedded system. Transp. Rev. https://doi.org/10.1080/01441647.2020.1840456 (2020).
Article Google Scholar
Dozza, M., Schwab, A. & Wegman, F. Safety science special issue on cycling safety. Saf. Sci. 92, 262–263. https://doi.org/10.1016/j.ssci.2016.06.009 (2017).
Article Google Scholar
de Hartog, J. J., Boogaard, H., Nijland, H. & Hoek, G. Do the health benefits of cycling outweigh the risks?. Environ. Health Perspect. 118(8), 1109–1116. https://doi.org/10.1289/ehp.0901747 (2010).
Article PubMed Central Google Scholar
Juhra, C. et al. Bicycle accidents—Do we only see the tip of the iceberg?. Injury 43(12), 2026–2034. https://doi.org/10.1016/j.injury.2011.10.016 (2012).
Article CAS PubMed Google Scholar
Pucher, J., Dill, J. & Handy, S. Infrastructure, programs, and policies to increase bicycling: An international review. Prev. Med. 50, S106–S125. https://doi.org/10.1016/j.ypmed.2009.07.028 (2010).
Article PubMed Google Scholar
Steinbach, R., Green, J., Datta, J. & Edwards, P. Cycling and the city: A case study of how gendered, ethnic and class identities can shape healthy transport choices. Soc. Sci. Med. 72(7), 1123–1130. https://doi.org/10.1016/j.socscimed.2011.01.033 (2011).
Article PubMed Google Scholar
Savan, B., Cohlmeyer, E. & Ledsham, T. Integrated strategies to accelerate the adoption of cycling for transportation. Transp. Res. Part F Traffic Psychol. Behav. 46, 236–249. https://doi.org/10.1016/j.trf.2017.03.002 (2017).
Article Google Scholar
TfL. Cycling action plan: Making London the world’s best big city for cycling. (2018).
Blaizot, S., Papon, F., Haddak, M. M. & Amoros, E. Injury incidence rates of cyclists compared to pedestrians, car occupants and powered two-wheeler riders, using a medical registry and mobility data, Rhône County, France. Accid. Anal. Prev. 58, 35–45. https://doi.org/10.1016/j.aap.2013.04.018 (2013).
Article PubMed Google Scholar
Aldred, R. & Goodman, A. Predictors of the frequency and subjective experience of cycling near misses: Findings from the first two years of the UK Near Miss Project. Accid. Anal. Prev. 110, 161–170. https://doi.org/10.1016/j.aap.2017.09.015 (2018).
Article PubMed Google Scholar
Branion-Calles, M., Nelson, T. & Winters, M. Comparing crowdsourced near-miss and collision cycling data and official bike safety reporting. Transp. Res. Rec. J. Transp. Res. Board 2662(1), 1. https://doi.org/10.3141/2662-01 (2017).
Article Google Scholar
Aldred, R. & Crosweller, S. Investigating the rates and impacts of near misses and related incidents among UK cyclists. J. Transp. Health 2(3), 379–393. https://doi.org/10.1016/j.jth.2015.05.006 (2015).
Article Google Scholar
De Rome, L. et al. Bicycle crashes in different riding environments in the Australian capital territory. Traffic Inj. Prev. 15(1), 81–88. https://doi.org/10.1080/15389588.2013.781591 (2014).
Article PubMed Google Scholar
Winters, M. & Branion-Calles, M. Cycling safety: Quantifying the under reporting of cycling incidents in Vancouver, British Columbia. J. Transp. Health 7, 48–53. https://doi.org/10.1016/j.jth.2017.02.010 (2017).
Article Google Scholar
Gatersleben, B. & Haddad, H. Who is the typical bicyclist?. Transp. Res. Part F Traffic Psychol. Behav. 13(1), 41–48. https://doi.org/10.1016/j.trf.2009.10.003 (2010).
Article Google Scholar
Sanders, R. L. Perceived traffic risk for cyclists: The impact of near miss and collision experiences. Accid. Anal. Prev. 75, 26–34. https://doi.org/10.1016/j.aap.2014.11.004 (2015).
Article PubMed Google Scholar
Beck, B. et al. Bicycling crash characteristics: An in-depth crash investigation study. Accid. Anal. Prev. 96, 219–227. https://doi.org/10.1016/j.aap.2016.08.012 (2016).
Article PubMed Google Scholar
Imprialou, M. & Quddus, M. Crash data quality for road safety research: Current state and future directions. Accid. Anal. Prev. https://doi.org/10.1016/j.aap.2017.02.022 (2017).
Article PubMed Google Scholar
Teschke, K. et al. Bicycling crash circumstances vary by route type: A cross-sectional analysis. BMC Public Health 14, 1. https://doi.org/10.1186/1471-2458-14-1205 (2014).
Article Google Scholar
Ibrahim, M. R., Haworth, J. & Cheng, T. Understanding cities with machine eyes: A review of deep computer vision in urban analytics. Cities 96, 102481. https://doi.org/10.1016/j.cities.2019.102481 (2020).
Article Google Scholar
Ibrahim, M. R., Haworth, J., Christie, N. & Cheng, T. CyclingNet: Detecting cycling near misses from video streams in complex urban scenes with deep learning. IET Intell. Transp. Syst. https://doi.org/10.1049/itr2.12101 (2021).
Article Google Scholar
Frey, B. B. Pearson correlation coefficient. In The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation 91320 (SAGE Publications, Inc., 2018).
Hoffman, J. I. E. Comparison of two groups. In Biostatistics for Medical and Biomedical Practitioners, 337–362 https://doi.org/10.1016/B978-0-12-802387-7.00022-6 (Elsevier, 2015).
Smalheiser, N. R. Null hypothesis statistical testing and the t-test. In Data Literacy, 127–136. https://doi.org/10.1016/B978-0-12-811306-6.00009-9 (Elsevier, 2017).
Ben-Akiva, M. et al. Modeling methods for discrete choice analysis. Mark. Lett. 8(3), 273–286 (1997).
Article Google Scholar
Schroeder, D. A. Discrete choice models. Account. Causal Eff., 77–95 (2010).
Bahadori, M. T. & Liu, Y. Granger causality analysis in irregular time series. In Proceedings of the 2012 SIAM International Conference on Data Mining, Society for Industrial and Applied Mathematics, 660–671. https://doi.org/10.1137/1.9781611972825.57 (2012).
Parkin, J. & Meyers, C. The effect of cycle lanes on the proximity between motor traffic and cycle traffic. Accid. Anal. Prev. 42(1), 159–165. https://doi.org/10.1016/j.aap.2009.07.018 (2010).
Article PubMed Google Scholar
Beck, B. et al. How much space do drivers provide when passing cyclists? Understanding the impact of motor vehicle and infrastructure characteristics on passing distance. Accid. Anal. Prev. 128, 253–260. https://doi.org/10.1016/j.aap.2019.03.007 (2019).
Article PubMed Google Scholar
Branion-Calles, M., Nelson, T. & Winters, M. Comparing crowdsourced near-miss and collision cycling data and official bike safety reporting. Transp. Res. Rec. J. Transp. Res. Board 2662(1), 1–11. https://doi.org/10.3141/2662-01 (2017).
Article Google Scholar
Gustafsson, L. & Archer, J. A naturalistic study of commuter cyclists in the greater Stockholm area. Accid. Anal. Prev. 58, 286–298. https://doi.org/10.1016/j.aap.2012.06.004 (2013).
Article PubMed Google Scholar
Nelson, T. A., Denouden, T., Jestico, B., Laberee, K. & Winters, M. BikeMaps.org: A global tool for collision and near miss mapping. Front. Public Health https://doi.org/10.3389/fpubh.2015.00053 (2015).
Article PubMed PubMed Central Google Scholar
Schleinitz, K., Petzoldt, T., Franke-Bartholdt, L., Krems, J. F. & Gehlert, T. Conflict partners and infrastructure use in safety critical events in cycling—Results from a naturalistic cycling study. Transp. Res. Part F Traffic Psychol. Behav. 31, 99–111. https://doi.org/10.1016/j.trf.2015.04.002 (2015).
Article Google Scholar
Dozza, M., Werneke, J. & Fernandez, A. Piloting the Naturalistic Methodology on Bicycles, 11 (2012).
Vanparijs, J., Int Panis, L., Meeusen, R. & de Geus, B. Exposure measurement in bicycle safety analysis: A review of the literature. Accid. Anal. Prev. 84, 9–19. https://doi.org/10.1016/j.aap.2015.08.007 (2015).
Article PubMed Google Scholar
Johnson, M., Newstead, S., Oxley, J. & Charlton, J. Cyclists and open vehicle doors: Crash characteristics and risk factors. Saf. Sci. 59, 135–140. https://doi.org/10.1016/j.ssci.2013.04.010 (2013).
Article Google Scholar
Chaurand, N. & Delhomme, P. Cyclists and drivers in road interactions: A comparison of perceived crash risk. Accid. Anal. Prev. 9 (2013).
Fuller, D., Gauvin, L., Morency, P., Kestens, Y. & Drouin, L. The impact of implementing a public bicycle share program on the likelihood of collisions and near misses in Montreal, Canada. Prev. Med. 57(6), 920–924. https://doi.org/10.1016/j.ypmed.2013.05.028 (2013).
Article PubMed Google Scholar
Lawson, A. R., Pakrashi, V., Ghosh, B. & Szeto, W. Y. Perception of safety of cyclists in Dublin City. Accid. Anal. Prev. 50, 499–511. https://doi.org/10.1016/j.aap.2012.05.029 (2013).
Article PubMed Google Scholar
Lehtonen, E., Havia, V., Kovanen, A., Leminen, M. & Saure, E. Evaluating bicyclists’ risk perception using video clips: Comparison of frequent and infrequent city cyclists. Transp. Res. Part F Traffic Psychol. Behav. 41, 195–203. https://doi.org/10.1016/j.trf.2015.04.006 (2016).
Article Google Scholar
Paschalidis, E., Basbas, S., Politis, I. & Prodromou, M. Put the blame on. . .others!: The battle of cyclists against pedestrians and car drivers at the urban environment. A cyclists’ perception study, 18 (2016).
Poulos, R. G., Hatfield, J., Rissel, C., Grzebieta, R. & McIntosh, A. S. Exposure-based cycling crash, near miss and injury rates: The Safer Cycling Prospective Cohort Study protocol: Figure 1. Inj. Prev. 18(1), e1. https://doi.org/10.1136/injuryprev-2011-040160 (2012).
Article PubMed Google Scholar
Vansteenkiste, P., Zeuwts, L., Cardon, G. & Lenoir, M. A hazard-perception test for cycling children: An exploratory study. Transp. Res. Part F Traffic Psychol. Behav. 41, 182–194. https://doi.org/10.1016/j.trf.2016.05.001 (2016).
Article Google Scholar
Goodfellow, I., Bengio, Y. & Courville, A. Deep learning. In Adaptive Computation and Machine Learning Series (The MIT Press, 2017).
Parikh, D. & Polikar, R. An ensemble-based incremental learning approach to data fusion. IEEE Trans. Syst. Man Cybern. Part B Cybern. 37(2), 437–450. https://doi.org/10.1109/TSMCB.2006.883873 (2007).
Article Google Scholar
Oliva, A. & Torralba, A. Chapter 2 Building the gist of a scene: The role of global image features in recognition. In Progress in Brain Research, 23–36 https://doi.org/10.1016/S0079-6123(06)55002-2 (Elsevier, 2006).
Ibrahim, M. R., Haworth, J. & Cheng, T. WeatherNet: Recognising weather and visual conditions from street-level images using deep residual learning. ISPRS Int. J. Geo-Inf. 8(12), 12. https://doi.org/10.3390/ijgi8120549 (2019).
Article Google Scholar
Jocher, G. Aultralytics/yolov5. GitHub. https://github.com/ultralytics/yolov5 (2020).
Lin, T.-Y. et al. Microsoft COCO: Common objects in context. In Computer Vision—ECCV 2014, 740–755. https://doi.org/10.1007/978-3-319-10602-1-48 (Springer, 2014).
Wojke, N. & Bewley, A. Deep cosine metric learning for person re-identification. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), https://doi.org/10.1109/WACV.2018.00087 (2018).
Leal-Taixé, L., Milan, A., Reid, I., Roth, S., & Schindler, K. MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking. ArXiv150401942 Cs, (Accessed 05 Aug 2015) http://arxiv.org/abs/1504.01942 (2020).
White, H. & Lu, X. Granger causality and dynamic structural systems. J. Financ. Econom. 8(2), 193–243. https://doi.org/10.1093/jjfinec/nbq006 (2010).
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Spatial Data Science, University of Leeds, Woodhouse, Leeds, LS2 9JT, UK
Mohamed Ibrahim
Leeds Institute for Data Analytics (LIDA), University of Leeds, Woodhouse, Leeds, LS2 9JT, UK
Mohamed Ibrahim

Authors

Mohamed Ibrahim
View author publications
Search author on:PubMed Google Scholar

Contributions

The author has created all aspects of the research paper.

Corresponding author

Correspondence to Mohamed Ibrahim.

Ethics declarations

Competing interests

The author declares no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Ibrahim, M. Computer vision and statistical insights into cycling near miss dynamics. Sci Rep 14, 21151 (2024). https://doi.org/10.1038/s41598-024-70733-8

Download citation

Received: 13 September 2023
Accepted: 20 August 2024
Published: 10 September 2024
Version of record: 10 September 2024
DOI: https://doi.org/10.1038/s41598-024-70733-8

Keywords

This article is cited by

A lightweight coal mine pedestrian detector for video surveillance systems with multi-level feature fusion and channel pruning
- Bei Jing Xie
- Heng Li
- Zhen Lei
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

The many paths ahead: toward an interdisciplinary framework for Critical Cycling Studies

The nonlinear relationship between built environment and cycling propensity for different travel purposes − based on extreme gradient boosting decision tree

CYCLANDS: Cycling geo-located accidents, their details and severities

Introduction

Results

Integrated framework

Framework stringency index (SI)

Association between variables

The impacts of risk factors in cycling near misses

Granger causality of risk factors

Discussion

Linking results to literature

Real-world applications

Limitations

Methods and materials

Proposed framework

Phase I: Extracting risk factors from the environment

Phase II: Detecting and tracking objects

Phase III: Detecting instant actions

Phase IV: Causal inference

Logistic regression

Granger causality

Framework stringency index

Remarks

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

A lightweight coal mine pedestrian detector for video surveillance systems with multi-level feature fusion and channel pruning

Search

Quick links