Real-world road damage dataset with potholes, cracks, and maintenance holes

Giordani, Enrico; Arcioni, Lorenzo; Gil-Martín, Manuel; Foresti, Gian Luca; Marini, Marco Raoul

doi:10.1038/s41598-026-46679-4

Download PDF

Article
Open access
Published: 01 April 2026

Real-world road damage dataset with potholes, cracks, and maintenance holes

Enrico Giordani¹^na1,
Lorenzo Arcioni¹^na1,
Manuel Gil-Martín²,
Gian Luca Foresti³ &
…
Marco Raoul Marini¹

Scientific Reports , Article number: (2026) Cite this article

118 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Road surface deterioration poses significant challenges to transportation safety, infrastructure longevity, and timely maintenance planning. Existing street-view datasets are often limited by wide-angle distortions that reduce geometric fidelity and hinder reliable damage analysis. This paper introduces the Road Damage Dataset: Potholes, Cracks, and Manholes, a novel dataset designed for robust detection of road-surface damage in urban and rural settings. The dataset was captured using two consumer-grade devices, acquiring diverse views that mimic real-world deployment situations. It contains high-resolution images with three major and often co-occurring road-damage classes: potholes, cracks, and maintenance holes. It includes 2009 hand-labeled images containing 1261 potholes, 2519 cracks, and 957 maintenance holes with verified bounding boxes. All images were post-processed to improve visual quality and remove sensitive information. The dataset includes several districts in Rome (Italy) and nearby semi-urban and rural towns such as Sacrofano, offering more environmental heterogeneity than many existing datasets. Thanks to its varied capture circumstances, viewing angles, and scene contexts, this dataset supports the development of generalizable models for real-world road-damage detection.

Data availability

The datasets generated and analysed during the current study are available in the Road Damage Dataset - Potholes, Cracks, and Manholes repositories on Zenodo⁴¹ and on Kaggle at https://www.kaggle.com/datasets/lorenzoarcioni /road-damage-dataset-potholes-cracks-and-manholes. Example code for loading and using the dataset is available at https://github.com/lorenzo-arcioni/Road-Damage-Detection-Dataset-Analysis. The dataset is released under Creative Commons Attribution 4.0 International license and is freely accessible without registration requirements. The dataset used for training YOLO models is available at https://www.kaggle.com/datasets/lorenzoarcioni/pothole-test. The image annotation tool is available at https://github.com/bnsreenu/digitalsreeni-image-annotator.

Code availability

The Python code used for the curation, analysis, training, and validation of the dataset is publicly available on GitHub, ensuring reproducibility and enabling further methodological and applied research. The repository is accessible at: https://github.com/lorenzo-arcioni/Road-Damage-Detection-Dataset-Analysis.

References

Arya, D. et al. Global road damage detection: State-of-the-art solutions. In 2020 IEEE Int. Conf. Big Data (Big Data), 5533–5539, 10.1109/BigData50022.2020.9377790 (2020).
Cui, L., Qi, Z., Chen, Z., Meng, F. & Shi, Y. Pavement distress detection using random decision forests. In Data Science (eds Zhang, C. et al.) 95–102 (Springer International Publishing, 2015).
Google Scholar
Stricker, R., Eisenbach, M., Sesselmann, M., Debes, K. & Gross, H.-M. Improving visual road condition assessment by extensive experiments on the extended gaps dataset. In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8, 10.1109/IJCNN.2019.8852257 (2019).
Arya, D., Maeda, H., Ghosh, S. K., Toshniwal, D. & Sekimoto, Y. Rdd2022: A multi-national image dataset for automatic road damage detection (2022). arxiv:2209.08538.
Maniat, M., Camp, C. V. & Kashani, A. R. Deep learning-based visual crack detection using google street view images. Neural Comput. Appl. 33, 14565–14582. https://doi.org/10.1007/s00521-021-06098-0 (2021).
Google Scholar
Lei, X., Liu, C., Li, L. & Wang, G. Automated pavement distress detection and deterioration analysis using street view map. IEEE Access 8, 76163–76172. https://doi.org/10.1109/ACCESS.2020.2989028 (2020).
Google Scholar
Ren, M., Zhang, X., Zhi, X., Wei, Y. & Feng, Z. An annotated street view image dataset for automated road damage detection. Sci. Data 11, 407. https://doi.org/10.1038/s41597-024-03263-7 (2024).
Google Scholar
Arya, D., Maeda, H., Ghosh, S. K., Toshniwal, D. & Sekimoto, Y. Rdd 2020: An annotated image dataset for automatic road damage detection using deep learning. Data Brief 36, 107133. https://doi.org/10.1016/j.dib.2021.107133 (2021).
Google Scholar
Kortmann, F. et al. Detecting various road damage types in global countries utilizing faster r-cnn. In 2020 IEEE International Conference on Big Data (Big Data), 5563–5571, 10.1109/BigData50022.2020.9378245 (2020).
Vishwakarma, R. & Vennelakanti, R. Cnn model & tuning for global road damage detection. In 2020 IEEE International Conference on Big Data (Big Data), 5609–5615, 10.1109/BigData50022.2020.9377902 (2020).
Pham, V., Pham, C. & Dang, T. Road damage detection and classification with detectron2 and faster r-cnn. In 2020 IEEE International Conference on Big Data (Big Data), 5592–5601, 10.1109/BigData50022.2020.9378027 (2020).
Lin, C. et al. Da-rdd: Toward domain adaptive road damage detection across different countries. IEEE Trans. Intell. Transp. Syst. 24, 3091–3103. https://doi.org/10.1109/TITS.2022.3221067 (2023).
Google Scholar
Kapp, A., Hoffmann, E., Weigmann, E. & Mihaljević, H. Streetsurfacevis: A dataset of crowdsourced street-level imagery annotated by road surface type and quality. Sci. Data 12, 92. https://doi.org/10.1038/s41597-024-04295-9 (2025).
Google Scholar
Yin, T., Zhang, W., Kou, J. & Liu, N. Promoting automatic detection of road damage: A high-resolution dataset, a new approach, and a new evaluation criterion. IEEE Trans. Autom. Sci. Eng. 22, 2472–2484. https://doi.org/10.1109/TASE.2024.3379945 (2025).
Google Scholar
Yang, H. et al. A large-scale image repository for automated pavement distress analysis and degradation trend prediction. Sci. Data 12, 1426. https://doi.org/10.1038/s41597-025-05748-5 (2025).
Google Scholar
Zhang, H. et al. A new road damage detection baseline with attention learning. Appl. Sci. https://doi.org/10.3390/app12157594 (2022).
Google Scholar
Pham, V., Nguyen, D. & Donan, C. Road damage detection and classification with yolov7. In 2022 IEEE International Conference on Big Data (Big Data), 6416–6423, 10.1109/BigData55660.2022.10020856 (2022).
Alfarrarjeh, A., Trivedi, D., Kim, S. H. & Shahabi, C. A deep learning approach for road damage detection from smartphone images. In 2018 IEEE International Conference on Big Data (Big Data), 5201–5204, 10.1109/BigData.2018.8621899 (2018).
Arya, D. et al. Deep learning-based road damage detection and classification for multiple countries. Autom. Constr. 132, 103935. https://doi.org/10.1016/j.autcon.2021.103935 (2021).
Google Scholar
Guo, G. & Zhang, Z. Road damage detection algorithm for improved yolov5. Sci. Rep. 12, 15523. https://doi.org/10.1038/s41598-022-19674-8 (2022).
Google Scholar
Pang, Z. et al. Road surface classification with texture-feature-embedded resnet for the active suspension systems in complex environments. Adv. Eng. Inform. 71, 104280. https://doi.org/10.1016/j.aei.2025.104280 (2026).
Google Scholar
Liu, Y. et al. A non-destructive automatic pavement damage detection scheme based on end-to-end neural networks with multi-level attention mechanism. Eng. Appl. Artif. Intell. 156, 111246. https://doi.org/10.1016/j.engappai.2025.111246 (2025).
Google Scholar
Yenni, H. et al. Mycd: Integration of yolo-cnn and densenet for real-time road damage detection based on field images. J. Appl. Data Sci. 7, 384–395. https://doi.org/10.47738/jads.v7i1.1040 (2025).
Google Scholar
Arcioni, L. & Giordani, E. Road damage dataset collection route map (2025). https://www.google.com/maps/d/viewer?mid=1WrrMPBqnh6v_GnQmfvKJWaq3R0YPD78&usp=sharing.
Bhattiprolu, S. Digitalsreeni image annotator (2024). https://github.com/bnsreenu/digitalsreeni-image-annotator.
Ravi, N. et al. Sam 2: Segment anything in images and videos (2024). arxiv:2408.00714.
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: Unified, real-time object detection. Preprint at arxiv:1506.02640 (2016).
Jocher, G. & Qiu, J. Ultralytics yolo11 (2024).
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems Vol. 25 (eds Pereira, F. et al.) (Curran Associates, Inc., 2012).
Google Scholar
Jocher, G., Chaurasia, A. & Qiu, J. Ultralytics yolov8 (2023).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks (2019). arxiv:1801.04381.
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition (2015). arxiv:1409.1556.
Liu, W. et al. SSD: Single Shot MultiBox Detector 21–37 (Springer International Publishing, 2016).
Google Scholar
Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks (2016). arxiv:1506.01497.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition (2015). arxiv:1512.03385.
Lin, T.-Y. et al. Feature pyramid networks for object detection (2017). arxiv:1612.03144.
Kaggle: Your Machine Learning and Data Science Community — kaggle.com. https://www.kaggle.com/.
Biewald, L. Experiment tracking with weights and biases (2020). Software available from wandb.com.
Lin, T. et al. Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). arxiv:1405.0312.
Khosravian, A., Amirkhani, A., Kashiani, H. & Masih-Tehrani, M. Generalizing state-of-the-art object detectors for autonomous vehicles in unseen environments. Expert Syst. Appl. 183, 115417. https://doi.org/10.1016/j.eswa.2021.115417 (2021).
Google Scholar
Giordani, E., Arcioni, L., Gil-Martín, M. & Marini, M. R. Road damage dataset: Potholes, cracks and manholes, 10.5281/zenodo.17834373 (2025).

Download references

Acknowledgements

Special thanks are extended to Kaggle for offering accessible GPU resources.

Funding

This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan (PNRR) funded by the European Union – Next Generation EU, Mission 4, CUP G23C24000790006 (2024-25). This paper was partially supported by University of Udine project on “Piano Stretegico Dipartimentale on Artificial Intelligence” (PSD-AI) (2022-25) project at the University of Udine. This research was partially supported by “Ayudas para estancias de movilidad en el extranjero José Castillejo para jóvenes doctores” from Ministerio de Ciencia, Innovación y Universidades of Spain. Moreover, it was supported by the ASTOUND project (101071191 HORIZON-EIC-2021-PATHFINDERCHALLENGES-01) funded by the European Commission. In addition, the Spanish Ministry of Science and Innovation, through the projects BeWord, GOMINOLA, TRUSTBOOST (PID2021-126061OB-C43, PID2020-118112RB-C21 and PID2020-118112RB-C22, PID2023-150584OB-C21 and PID2023-150584OB-C22, funded by MCIN/AEI/10.13039/501100011033, and by the European Union “NextGenerationEU/PRTR”).

Author information

These authors contributed equally: Enrico Giordani and Lorenzo Arcioni.

Authors and Affiliations

VisionLab, Department of Computer Science, Sapienza University, Rome, 00198, Italy
Enrico Giordani, Lorenzo Arcioni & Marco Raoul Marini
Grupo de Tecnología del Habla y Aprendizaje Automático (THAU Group), Information Processing and Telecommunications Center, E.T.S.I. de Telecomunicación, Universidad Politécnica de Madrid (UPM), Madrid, 28040, Spain
Manuel Gil-Martín
Department of Computer Science, Mathematics and Physics, University of Udine, Via delle Scienze 206, Udine, UD 33100, Italy
Gian Luca Foresti

Authors

Enrico Giordani
View author publications
Search author on:PubMed Google Scholar
Lorenzo Arcioni
View author publications
Search author on:PubMed Google Scholar
Manuel Gil-Martín
View author publications
Search author on:PubMed Google Scholar
Gian Luca Foresti
View author publications
Search author on:PubMed Google Scholar
Marco Raoul Marini
View author publications
Search author on:PubMed Google Scholar

Contributions

L.A., E.G., and M.R.M. conceived the methodology, L.A. and E.G. conducted the data collection and performed the baseline experiments, G.L.F. supervised the work and M.G.-M. wrote the original draft of the manuscript. All authors validated and formally analyzed the dataset and reviewed the final manuscript.

Corresponding author

Correspondence to Marco Raoul Marini.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Giordani, E., Arcioni, L., Gil-Martín, M. et al. Real-world road damage dataset with potholes, cracks, and maintenance holes. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46679-4

Download citation

Received: 19 December 2025
Accepted: 27 March 2026
Published: 01 April 2026
DOI: https://doi.org/10.1038/s41598-026-46679-4

Real-world road damage dataset with potholes, cracks, and maintenance holes

Subjects

Abstract

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links