Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Learning collision risk proactively from naturalistic driving data at scale

A preprint version of the article is available at arXiv.

Abstract

Accurately and proactively alerting drivers or automated systems to emerging collisions is crucial for road safety, particularly in highly interactive and complex urban environments. Existing methods require labour-intensive annotation of sparse risk, struggle to consider varying contextual factors or are tailored to limited scenarios. Here we present the generalized surrogate safety measure (GSSM), a data-driven approach that learns collision risk from naturalistic driving without the need for crash or risk labels. Trained on diverse datasets and evaluated on 2,591 real-world crashes and near-crashes, a basic GSSM using only instantaneous motion kinematics achieves an area under the precision–recall curve of 0.9 and secures a median time advance of 2.6 s to prevent potential collisions. Incorporating more interaction patterns and contextual factors provides further performance gains. Across interaction scenarios, such as rear end, merging and turning, GSSM consistently outperforms existing baselines in terms of accuracy and timeliness. These results establish GSSM as a scalable, context-aware and generalizable foundation for identifying risky interactions before they become unavoidable and support proactive safety in autonomous driving systems and traffic incident management.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Statistics of traffic accidents highlight the necessity to improve urban traffic safety.
Fig. 2: Evaluation of the effectiveness and scalability of GSSM.
Fig. 3: Top-ranked factors in risk quantification by GSSM.

Similar content being viewed by others

Data availability

All data used in this study are either publicly available or accessible under controlled conditions. The two subsets of the SHRP2 NDS, ‘A Study on the Factors That Affect the Occurrence of Crashes and Near-Crashes’72 and ‘Research of Driver Assistant System’73, can be accessed following the instructions at https://github.com/Yiru-Jiao/GSSM. The highD dataset can be freely applied for at https://www.highd-dataset.com. The ArgoverseHV dataset can be accessed following the guidelines at https://github.com/RomainLITUD/conflict_resolution_dataset. The resulting data in this study, such as trained models, loss logs and evaluation results, are compressed as two zip files in the ./PreparedData/ and ./ResultData/ folders. They can be downloaded from https://doi.org/10.4121/9caa1e6c-9abd-4e36-ae28-c9ea4542d940 (ref. 74).

Code availability

The code for this research is open source and available via GitHub at https://github.com/Yiru-Jiao/GSSM. A permanent record of the code repository for this paper is available via Zenodo at https://doi.org/10.5281/zenodo.17099863 (ref. 75).

References

  1. Global Status Report on Road Safety 2023 (World Health Organization, 2023).

  2. Policies on Spatial Distribution and Urbanization Have Broad Impacts on Sustainable Development Technical Report 2020/2 (United Nations, 2020).

  3. Saul, H., Junghans, M., Dotzauer, M. & Gimm, K. Online risk estimation of critical and non-critical interactions between right-turning motorists and crossing cyclists by a decision tree. Accid. Anal. Prev. 163, 106449 (2021).

    Article  Google Scholar 

  4. Papadimitriou, E. et al. Review and ranking of crash risk factors related to the road infrastructure. Accid. Anal. Prev. 125, 85–97 (2019).

    Article  Google Scholar 

  5. Jakobsen, M. D. et al. Influence of occupational risk factors for road traffic crashes among professional drivers: systematic review. Transp. Rev. 43, 533–563 (2022).

    Article  Google Scholar 

  6. Horrey, W. J., Lesch, M. F., Dainoff, M. J., Robertson, M. M. & Noy, Y. I. On-board safety monitoring systems for driving: review, knowledge gaps, and framework. J. Saf. Res. 43, 49–58 (2012).

    Article  Google Scholar 

  7. Chang, A., Saunier, N. & Laureshyn, A. Proactive Methods for Road Safety Analysis (SAE, 2017); https://doi.org/10.4271/wp-0005.

  8. Arun, A., Haque, M. M., Washington, S., Sayed, T. & Mannering, F. A systematic review of traffic conflict-based safety measures with a focus on application context. Anal. Methods Accid. Res. 32, 100185 (2021).

    Google Scholar 

  9. Wang, C., Xie, Y., Huang, H. & Liu, P. A review of surrogate safety measures and their applications in connected and automated vehicles safety modeling. Accid. Anal. Prev. 157, 106157 (2021).

    Article  Google Scholar 

  10. Westhofen, L. et al. Criticality metrics for automated driving: a review and suitability analysis of the state of the art. Arch. Comput. Methods Eng. 30, 1–35 (2022).

    Article  Google Scholar 

  11. Hayward, J. C. Near miss determination through use of a scale of danger. In Proc. 51st Annual Meeting of the Highway Research Board 24–34 (National Academies, 1972).

  12. Cooper, D. F. & Ferguson, N. Traffic studies at T-junctions—a conflict simulation record. Traffic Eng. Control 17, 306–309 (1976).

    Google Scholar 

  13. Cooper, P. Reports from group discussions. In Proc. 1st Workshop on Traffic Conflicts (eds Amundsen, F. H. & Hydén, C.) 118–136 (Institute of Transport Economics (TØI) and Lund Institute of Technology (LTH), 1977).

  14. Hydén, C. The Development of a Method for Traffic Safety Evaluation: The Swedish Traffic Conflicts Technique. PhD thesis (1987).

  15. Wang, J., Wu, J. & Li, Y. The driving safety field based on driver-vehicle-road interactions. IEEE Trans. Intell. Transp. Syst. 16, 2203–2214 (2015).

    Article  Google Scholar 

  16. Mullakkal-Babu, F. A., Wang, M., He, X., van Arem, B. & Happee, R. Probabilistic field approach for motorway driving risk assessment. Transp. Res. Part C 118, 102716 (2020).

    Article  Google Scholar 

  17. Kolekar, S., de Winter, J. & Abbink, D. Human-like driving behaviour emerges from a risk-based driver model. Nat. Commun. 11, 4850 (2020).

    Article  Google Scholar 

  18. Papadimitriou, E. Road Safety Thematic Report—Main Factors Causing Fatal Crashes (European Commission, 2024).

  19. Dey, A. K. Understanding and using context. Pers. Ubiquitous Comput. 5, 4–7 (2001).

    Article  Google Scholar 

  20. Lefèvre, S., Vasquez, D. & Laugier, C. A survey on motion prediction and risk assessment for intelligent vehicles. ROBOMECH J. 1, 1 (2014).

  21. Dahl, J., de Campos, G. R., Olsson, C. & Fredriksson, J. Collision avoidance: a literature review on threat-assessment techniques. IEEE Trans. Intell. Veh. 4, 101–113 (2019).

    Article  Google Scholar 

  22. Althoff, M. & Dolan, J. M. Online verification of automated road vehicles using reachability analysis. IEEE Trans. Robot. 30, 903–918 (2014).

    Article  Google Scholar 

  23. Kim, J. & Kum, D. Collision risk assessment algorithm via lane-based probabilistic motion prediction of surrounding vehicles. IEEE Trans. Intell. Transp. Syst. 19, 2965–2976 (2018).

    Article  Google Scholar 

  24. Mathiesen, F. B., Romao, L., Calvert, S. C., Laurenti, L. & Abate, A. A data-driven approach for safety quantification of non-linear stochastic systems with unknown additive noise distribution. Preprint at https://arxiv.org/abs/2410.06662 (2024).

  25. McAllister, R. et al. Concrete problems for autonomous vehicle safety: advantages of Bayesian deep learning. In Proc. 26th International Joint Conference on Artificial Intelligence 4745–4753 (2017).

  26. Li, D. et al. Safe motion planning for autonomous vehicles by quantifying uncertainties of deep learning-enabled environment perception. IEEE Trans. Intell. Veh. 9, 2318–2332 (2024).

    Article  Google Scholar 

  27. Kataoka, H., Suzuki, T., Oikawa, S., Matsui, Y. & Satoh, Y. Drive video analysis for the detection of traffic near-miss incidents. In Proc. 2018 IEEE International Conference on Robotics and Automation (ICRA) 3421–3428 (IEEE, 2018).

  28. Fang, J., Qiao, J., Xue, J. & Li, Z. Vision-based traffic accident detection and anticipation: a survey. IEEE Trans. Circuits Syst. Video Technol. 34, 1983–1999 (2024).

    Article  Google Scholar 

  29. Patera, P., Chen, Y.-T. & Fang, W.-H. A multi-modal architecture with spatio-temporal-text adaptation for video-based traffic accident anticipation. IEEE Trans. Circuits Syst. Video Technol. https://doi.org/10.1109/tcsvt.2025.3552895 (2025).

  30. Kumamoto, Y., Ohtani, K., Suzuki, D., Yamataka, M. & Takeda, K. AAT-DA: Accident anticipation transformer with driver attention. In Proc. Winter Conference on Applications of Computer Vision (WACV) Workshops 1142–1151 (CVF, 2025).

  31. Liu, H. X. & Feng, S. Curse of rarity for autonomous vehicles. Nat. Commun. 15, 4808 (2024).

    Article  Google Scholar 

  32. Summala, H. Brake reaction times and driver behavior analysis. Transp. Hum. Factors 2, 217–226 (2000).

    Article  Google Scholar 

  33. Markkula, G., Engström, J., Lodin, J., Bärgman, J. & Victor, T. A farewell to brake reaction times? Kinematics-dependent brake response in naturalistic rear-end emergencies. Accid. Anal. Prev. 95, 209–226 (2016).

    Article  Google Scholar 

  34. Davis, J. & Goadrich, M. The relationship between precision–recall and ROC curves. In Proc. 23rd International Conference on Machine learning 233–240 (2006).

  35. Krajewski, R., Bock, J., Kloeker, L. & Eckstein, L. The highD dataset: a drone dataset of naturalistic vehicle trajectories on German highways for validation of highly automated driving systems. In Proc. IEEE 21st International Conference on Intelligent Transportation Systems (ITSC) 2118–2125 (IEEE, 2018).

  36. Wilson, B. et al. Argoverse 2: next generation datasets for self-driving perception and forecasting. In 35th Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2) https://openreview.net/pdf?id=vKQGe36av4k (2021).

  37. Li, G., Jiao, Y., Calvert, S. C. & van Lint, J. H. Lateral conflict resolution data derived from Argoverse-2: analysing safety and efficiency impacts of autonomous vehicles at intersections. Transp. Res. Part C 167, 104802 (2024).

    Article  Google Scholar 

  38. Hankey, J. M., Perez, M. A. & McClafferty, J. A. Description of the SHRP 2 Naturalistic Database and the Crash, Near-crash, and Baseline Data Sets (Virginia Tech Transportation Institute, 2016).

  39. Thomas, P., Morris, A., Talbot, R. & Fagerlind, H. Identifying the causes of road crashes in Europe. In Proc. AAAM 57th Annual Conference, Vol. 57 13–22 (Association for the Advancement of Automotive Medicine, 2013).

  40. Wang, J. et al. Driving risk assessment using near-crash database through data mining of tree-based model. Accid. Anal. Prev. 84, 54–64 (2015).

    Article  Google Scholar 

  41. Nahata, R., Omeiza, D., Howard, R. & Kunze, L. Assessing and explaining collision risk in dynamic environments for autonomous driving safety. In 2021 IEEE International Intelligent Transportation Systems Conference (ITSC) 223–230, (IEEE, 2021).

  42. Chen, J. et al. A driving risk assessment framework considering driver’s fatigue state and distraction behavior. IEEE Trans. Intell. Transp. Syst. 25, 20120–20136 (2024).

    Article  Google Scholar 

  43. Yao, Y., Xu, M., Wang, Y., Crandall, D. J. & Atkins, E. M. Unsupervised traffic accident detection in first-person videos. In Proc. IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (2019).

  44. Fang, J., Qiao, J., Bai, J., Yu, H. & Xue, J. Traffic accident detection via self-supervised consistency learning in driving scenarios. IEEE Trans. Intell. Transp. Syst. 23, 9601–9614 (2022).

    Article  Google Scholar 

  45. Yao, Y. et al. DoTA: unsupervised detection of traffic anomaly in driving videos. IEEE Trans. Pattern Anal. Mach. Intell. https://doi.org/10.1109/TPAMI.2022.3150763 (2022).

  46. Li, C. et al. TTC4MCP: monocular collision prediction based on self-supervised ttc estimation. In 2023 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 244–250 (IEEE, 2023).

  47. Laureshyn, A., Svensson, A. & Hydén, C. Evaluation of traffic safety, based on micro-level behavioural data: theoretical framework and first implementation. Accid. Anal. Prev. 42, 1637–1646 (2010).

    Article  Google Scholar 

  48. Schiff, W. & Detwiler, M. L. Information used in judging impending collision. Perception 8, 647–658 (1979).

    Article  Google Scholar 

  49. Teigen, K. H. The proximity heuristic in judgments of accident probabilities. Br. J. Psychol. 96, 423–440 (2005).

    Article  Google Scholar 

  50. Camara, F. & Fox, C. Space invaders: pedestrian proxemic utility functions and trust zones for autonomous vehicle interactions. Int. J. Soc. Robotics 13, 1929–1949 (2020).

    Article  Google Scholar 

  51. Jiao, Y., Calvert, S. C., van Cranenburgh, S. & van Lint, H. A unified probabilistic approach to traffic conflict detection. Analytic Methods Accid. Res. 45, 100369 (2025).

    Article  Google Scholar 

  52. Jiao, Y., Calvert, S. C., van Cranenburgh, S. & van Lint, H. Inferring vehicle spacing in urban traffic from trajectory data. Transp. Res. Part C 155, 104289 (2023).

    Article  Google Scholar 

  53. Meng, Q. & Qu, X. Estimation of rear-end vehicle crash frequencies in urban road tunnels. Accid. Anal. Prev. 48, 254–263 (2012).

    Article  Google Scholar 

  54. Pawar, D. S. & Patil, G. R. Critical gap estimation for pedestrians at uncontrolled mid-block crossings on high-speed arterials. Saf. Sci. 86, 295–303 (2016).

    Article  Google Scholar 

  55. Anwari, N., Abdel-Aty, M., Goswamy, A. & Zheng, O. Investigating surrogate safety measures at midblock pedestrian crossings using multivariate models with roadside camera data. Accid. Anal. Prev. 192, 107233 (2023).

    Article  Google Scholar 

  56. Erion, G., Janizek, J. D., Sturmfels, P., Lundberg, S. M. & Lee, S.-I. Improving performance of deep learning models with axiomatic attribution priors and expected gradients. Nat. Mach. Intell. 3, 620–631 (2021).

    Article  Google Scholar 

  57. Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Proc. 34th International Conference on Machine Learning (ICML), Vol. 70, 3319–3328 (2017).

  58. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

    Article  Google Scholar 

  59. Hochreiter, S. & Schmidhuber, J. Long short-term memory. Neural Comput. 9, 1735–1780 (1997).

    Article  Google Scholar 

  60. Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems, Vol. 30 (eds Guyon, I. et al.) 5998–6008 (Curran Associates, 2017).

  61. Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).

    Article  Google Scholar 

  62. Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs). Preprint at https://arxiv.org/abs/1606.08415 (2016).

  63. Yu, F. X. X., Suresh, A. T., Choromanski, K. M., Holtmann-Rice, D. N. & Kumar, S. Orthogonal random features. In Proc. Advances in Neural Information Processing Systems, Vol. 29 (eds Lee, D. et al.) (Curran Associates, 2016).

  64. Venthuruthiyil, S. P. & Chunchu, M. Anticipated collision time (ACT): a two-dimensional surrogate safety indicator for trajectory-based proactive safety assessment. Transp. Res. Part C 139, 103655 (2022).

    Article  Google Scholar 

  65. Guo, H., Xie, K. & Keyvan-Ekbatani, M. Modeling driver’s evasive behavior during safety-critical lane changes: two-dimensional time-to-collision and deep reinforcement learning. Accid. Anal. Prev. 186, 107063 (2023).

    Article  Google Scholar 

  66. Cheng, H. et al. Emergency index (EI): a two-dimensional surrogate safety measure considering vehicles’ interaction depth. Transp. Res. Part C 171, 104981 (2025).

    Article  Google Scholar 

  67. Eggert, J. & Puphal, T. Continuous risk measures for ADAS and AD. In Proc. Future Active Safety Technology Symposium 1–8 (Society of Automotive Engineers of Japan, 2017).

  68. Puphal, T., Probst, M. & Eggert, J. Probabilistic uncertainty-aware risk spot detector for naturalistic driving. IEEE Trans. Intell. Veh. 4, 406–415 (2019).

    Article  Google Scholar 

  69. de Gelder, E. et al. PRISMA: a novel approach for deriving probabilistic surrogate safety measures for risk evaluation. Accid. Anal. Prev. 192, 107273 (2023).

    Article  Google Scholar 

  70. Antin, J. F. et al. Second Strategic Highway Research Program Naturalistic Driving Study methods. Saf. Sci. 119, 2–10 (2019).

    Article  Google Scholar 

  71. Jiao, Y. & Calvert, S. Bird’s Eye View Trajectory Reconstruction of Naturalistic Crashes and Near-crashes in the SHRP2 NDS (Virginia Tech Transportation Institute, 2025).

  72. Sears, E. et al. A Study on the Factors that Affect the Occurrence of Crashes and Near-crashes (Virginia Tech Transportation Institute, 2019).

  73. Layman, C. K., Perez, M. A., Sugino, T. & Eggert, J. Research of Driver Assistant System (Virginia Tech Transportation Institute, 2019).

  74. Jiao, Y., Calvert, S., van Cranenburgh, S. & van Lint, H. Data underlying the publication: Learning collision risk proactively from naturalistic driving data at scale. 4T2U.ResearchData https://doi.org/10.4121/9caa1e6c-9abd-4e36-ae28-c9ea4542d940 (2025).

  75. Jiao, Y. Yiru-jiao/gssm: First official release. Zenodo http://zenodo.org/records/17591050 (2025).

  76. Global Burden of Disease Study 2021 (GBD 2021) Results (Institute for Health Metrics and Evaluation, 2022).

Download references

Acknowledgements

This work is supported by the TU Delft AI Labs programme. We acknowledge the use of computational resources of the DelftBlue supercomputer provided by the Delft High Performance Computing Centre (https://www.tudelft.nl/dhpc). We extend our sincere gratitude to the researchers and organizations that collected, created, cleaned and curated the high-quality datasets for research use. We thank G. Li for sharing his knowledge on training neural networks. The raw data were accessed under SHRP2 Data Use Licence SHRP2-DUL-A-5-24-746, issued by the Virginia Tech Transportation Institute. The findings and conclusions presented here are those of the authors and do not necessarily represent the views of the Virginia Tech Transportation Institute, the Transportation Research Board, the National Academies or the Federal Highway Administration.

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization: Y.J. Data curation: Y.J., S.C.C. Methodology: Y.J. Investigation: Y.J., S.C.C., S.v.C., H.v.L. Formal analysis: Y.J. Software: Y.J. Funding acquisition: S.C.C., S.v.C. Resources: S.C.C., H.v.L. Supervision: S.C.C, S.v.C., H.v.L. Writing—original draft: Y.J. Writing—review and editing: Y.J., S.C.C., S.v.C., H.v.L.

Corresponding author

Correspondence to Yiru Jiao  (焦艺茹).

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Machine Intelligence thanks Kay Gimm and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Statistics of original and processed events in the SHRP2 NDS.

a, Numbers of crashes, near-crashes, and safe baselines that are recorded, reconstructed, and used in the test. The shaded areas indicate events that could not be reconstructed. b, Distribution of event types that are recorded, reconstructed, and used in the test. c, Distribution of weather and road surface conditions in the test events. d, Distribution of lighting conditions in the test events. e, Distribution of traffic conditions in the test events. LOS stands for level of service, detailed definitions of which are referred to Extended Data Table 1.

Extended Data Fig. 2 Performance comparison of GSSM and other existing methods in alerting different types of safety-critical events.

a, Types of the crashes and near-crashes with determined conflicting objects. b, Receiver operating characteristic curves, precision-recall curves, and accuracy-timeliness curves for different types of events. In the ATC plots, the shaded bands represent 99% confidence intervals for median time to impact. The GSSM under comparison is trained on the SafeBaseline data and uses contextual information of instantaneous motion kinematics, environmental conditions, and historical kinematics in the past 2.5 seconds.

Extended Data Fig. 3

Separation of safe and dangerous periods for each safety-critical event in the test set.

Extended Data Fig. 4

Safety pyramid that conceptualises the evolution from safe interactions to unsafe interactions up to crashes.

Extended Data Table 1 Contextual information considered in this study
Extended Data Table 2 Hyperparameters set in this paper’s experiments
Extended Data Table 3 Two-dimensional Surrogate Safety Measures (2D SSMs) used as baseline methods

Supplementary information

Supplementary Information (download PDF )

Supplementary Sections 1–3, Fig. 1 and Tables 1–3.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Video 1 (download ZIP )

Visualization of the dynamic evolution and quantification of collision risk in ten safety-critical events.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jiao, Y., Calvert, S.C., van Cranenburgh, S. et al. Learning collision risk proactively from naturalistic driving data at scale. Nat Mach Intell 8, 337–350 (2026). https://doi.org/10.1038/s42256-026-01189-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s42256-026-01189-w

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing