Abstract
Road surface deterioration poses significant challenges to transportation safety, infrastructure longevity, and timely maintenance planning. Existing street-view datasets are often limited by wide-angle distortions that reduce geometric fidelity and hinder reliable damage analysis. This paper introduces the Road Damage Dataset: Potholes, Cracks, and Manholes, a novel dataset designed for robust detection of road-surface damage in urban and rural settings. The dataset was captured using two consumer-grade devices, acquiring diverse views that mimic real-world deployment situations. It contains high-resolution images with three major and often co-occurring road-damage classes: potholes, cracks, and maintenance holes. It includes 2009 hand-labeled images containing 1261 potholes, 2519 cracks, and 957 maintenance holes with verified bounding boxes. All images were post-processed to improve visual quality and remove sensitive information. The dataset includes several districts in Rome (Italy) and nearby semi-urban and rural towns such as Sacrofano, offering more environmental heterogeneity than many existing datasets. Thanks to its varied capture circumstances, viewing angles, and scene contexts, this dataset supports the development of generalizable models for real-world road-damage detection.
Data availability
The datasets generated and analysed during the current study are available in the Road Damage Dataset - Potholes, Cracks, and Manholes repositories on Zenodo41 and on Kaggle at https://www.kaggle.com/datasets/lorenzoarcioni /road-damage-dataset-potholes-cracks-and-manholes. Example code for loading and using the dataset is available at https://github.com/lorenzo-arcioni/Road-Damage-Detection-Dataset-Analysis. The dataset is released under Creative Commons Attribution 4.0 International license and is freely accessible without registration requirements. The dataset used for training YOLO models is available at https://www.kaggle.com/datasets/lorenzoarcioni/pothole-test. The image annotation tool is available at https://github.com/bnsreenu/digitalsreeni-image-annotator.
Code availability
The Python code used for the curation, analysis, training, and validation of the dataset is publicly available on GitHub, ensuring reproducibility and enabling further methodological and applied research. The repository is accessible at: https://github.com/lorenzo-arcioni/Road-Damage-Detection-Dataset-Analysis.
References
Arya, D. et al. Global road damage detection: State-of-the-art solutions. In 2020 IEEE Int. Conf. Big Data (Big Data), 5533–5539, 10.1109/BigData50022.2020.9377790 (2020).
Cui, L., Qi, Z., Chen, Z., Meng, F. & Shi, Y. Pavement distress detection using random decision forests. In Data Science (eds Zhang, C. et al.) 95–102 (Springer International Publishing, 2015).
Stricker, R., Eisenbach, M., Sesselmann, M., Debes, K. & Gross, H.-M. Improving visual road condition assessment by extensive experiments on the extended gaps dataset. In 2019 International Joint Conference on Neural Networks (IJCNN), 1–8, 10.1109/IJCNN.2019.8852257 (2019).
Arya, D., Maeda, H., Ghosh, S. K., Toshniwal, D. & Sekimoto, Y. Rdd2022: A multi-national image dataset for automatic road damage detection (2022). arxiv:2209.08538.
Maniat, M., Camp, C. V. & Kashani, A. R. Deep learning-based visual crack detection using google street view images. Neural Comput. Appl. 33, 14565–14582. https://doi.org/10.1007/s00521-021-06098-0 (2021).
Lei, X., Liu, C., Li, L. & Wang, G. Automated pavement distress detection and deterioration analysis using street view map. IEEE Access 8, 76163–76172. https://doi.org/10.1109/ACCESS.2020.2989028 (2020).
Ren, M., Zhang, X., Zhi, X., Wei, Y. & Feng, Z. An annotated street view image dataset for automated road damage detection. Sci. Data 11, 407. https://doi.org/10.1038/s41597-024-03263-7 (2024).
Arya, D., Maeda, H., Ghosh, S. K., Toshniwal, D. & Sekimoto, Y. Rdd 2020: An annotated image dataset for automatic road damage detection using deep learning. Data Brief 36, 107133. https://doi.org/10.1016/j.dib.2021.107133 (2021).
Kortmann, F. et al. Detecting various road damage types in global countries utilizing faster r-cnn. In 2020 IEEE International Conference on Big Data (Big Data), 5563–5571, 10.1109/BigData50022.2020.9378245 (2020).
Vishwakarma, R. & Vennelakanti, R. Cnn model & tuning for global road damage detection. In 2020 IEEE International Conference on Big Data (Big Data), 5609–5615, 10.1109/BigData50022.2020.9377902 (2020).
Pham, V., Pham, C. & Dang, T. Road damage detection and classification with detectron2 and faster r-cnn. In 2020 IEEE International Conference on Big Data (Big Data), 5592–5601, 10.1109/BigData50022.2020.9378027 (2020).
Lin, C. et al. Da-rdd: Toward domain adaptive road damage detection across different countries. IEEE Trans. Intell. Transp. Syst. 24, 3091–3103. https://doi.org/10.1109/TITS.2022.3221067 (2023).
Kapp, A., Hoffmann, E., Weigmann, E. & Mihaljević, H. Streetsurfacevis: A dataset of crowdsourced street-level imagery annotated by road surface type and quality. Sci. Data 12, 92. https://doi.org/10.1038/s41597-024-04295-9 (2025).
Yin, T., Zhang, W., Kou, J. & Liu, N. Promoting automatic detection of road damage: A high-resolution dataset, a new approach, and a new evaluation criterion. IEEE Trans. Autom. Sci. Eng. 22, 2472–2484. https://doi.org/10.1109/TASE.2024.3379945 (2025).
Yang, H. et al. A large-scale image repository for automated pavement distress analysis and degradation trend prediction. Sci. Data 12, 1426. https://doi.org/10.1038/s41597-025-05748-5 (2025).
Zhang, H. et al. A new road damage detection baseline with attention learning. Appl. Sci. https://doi.org/10.3390/app12157594 (2022).
Pham, V., Nguyen, D. & Donan, C. Road damage detection and classification with yolov7. In 2022 IEEE International Conference on Big Data (Big Data), 6416–6423, 10.1109/BigData55660.2022.10020856 (2022).
Alfarrarjeh, A., Trivedi, D., Kim, S. H. & Shahabi, C. A deep learning approach for road damage detection from smartphone images. In 2018 IEEE International Conference on Big Data (Big Data), 5201–5204, 10.1109/BigData.2018.8621899 (2018).
Arya, D. et al. Deep learning-based road damage detection and classification for multiple countries. Autom. Constr. 132, 103935. https://doi.org/10.1016/j.autcon.2021.103935 (2021).
Guo, G. & Zhang, Z. Road damage detection algorithm for improved yolov5. Sci. Rep. 12, 15523. https://doi.org/10.1038/s41598-022-19674-8 (2022).
Pang, Z. et al. Road surface classification with texture-feature-embedded resnet for the active suspension systems in complex environments. Adv. Eng. Inform. 71, 104280. https://doi.org/10.1016/j.aei.2025.104280 (2026).
Liu, Y. et al. A non-destructive automatic pavement damage detection scheme based on end-to-end neural networks with multi-level attention mechanism. Eng. Appl. Artif. Intell. 156, 111246. https://doi.org/10.1016/j.engappai.2025.111246 (2025).
Yenni, H. et al. Mycd: Integration of yolo-cnn and densenet for real-time road damage detection based on field images. J. Appl. Data Sci. 7, 384–395. https://doi.org/10.47738/jads.v7i1.1040 (2025).
Arcioni, L. & Giordani, E. Road damage dataset collection route map (2025). https://www.google.com/maps/d/viewer?mid=1WrrMPBqnh6v_GnQmfvKJWaq3R0YPD78&usp=sharing.
Bhattiprolu, S. Digitalsreeni image annotator (2024). https://github.com/bnsreenu/digitalsreeni-image-annotator.
Ravi, N. et al. Sam 2: Segment anything in images and videos (2024). arxiv:2408.00714.
Redmon, J., Divvala, S., Girshick, R. & Farhadi, A. You only look once: Unified, real-time object detection. Preprint at arxiv:1506.02640 (2016).
Jocher, G. & Qiu, J. Ultralytics yolo11 (2024).
Snoek, J., Larochelle, H. & Adams, R. P. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems Vol. 25 (eds Pereira, F. et al.) (Curran Associates, Inc., 2012).
Jocher, G., Chaurasia, A. & Qiu, J. Ultralytics yolov8 (2023).
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A. & Chen, L.-C. Mobilenetv2: Inverted residuals and linear bottlenecks (2019). arxiv:1801.04381.
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition (2015). arxiv:1409.1556.
Liu, W. et al. SSD: Single Shot MultiBox Detector 21–37 (Springer International Publishing, 2016).
Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks (2016). arxiv:1506.01497.
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition (2015). arxiv:1512.03385.
Lin, T.-Y. et al. Feature pyramid networks for object detection (2017). arxiv:1612.03144.
Kaggle: Your Machine Learning and Data Science Community — kaggle.com. https://www.kaggle.com/.
Biewald, L. Experiment tracking with weights and biases (2020). Software available from wandb.com.
Lin, T. et al. Microsoft COCO: common objects in context. CoRR abs/1405.0312 (2014). arxiv:1405.0312.
Khosravian, A., Amirkhani, A., Kashiani, H. & Masih-Tehrani, M. Generalizing state-of-the-art object detectors for autonomous vehicles in unseen environments. Expert Syst. Appl. 183, 115417. https://doi.org/10.1016/j.eswa.2021.115417 (2021).
Giordani, E., Arcioni, L., Gil-Martín, M. & Marini, M. R. Road damage dataset: Potholes, cracks and manholes, 10.5281/zenodo.17834373 (2025).
Acknowledgements
Special thanks are extended to Kaggle for offering accessible GPU resources.
Funding
This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery and Resilience Plan (PNRR) funded by the European Union – Next Generation EU, Mission 4, CUP G23C24000790006 (2024-25). This paper was partially supported by University of Udine project on “Piano Stretegico Dipartimentale on Artificial Intelligence” (PSD-AI) (2022-25) project at the University of Udine. This research was partially supported by “Ayudas para estancias de movilidad en el extranjero José Castillejo para jóvenes doctores” from Ministerio de Ciencia, Innovación y Universidades of Spain. Moreover, it was supported by the ASTOUND project (101071191 HORIZON-EIC-2021-PATHFINDERCHALLENGES-01) funded by the European Commission. In addition, the Spanish Ministry of Science and Innovation, through the projects BeWord, GOMINOLA, TRUSTBOOST (PID2021-126061OB-C43, PID2020-118112RB-C21 and PID2020-118112RB-C22, PID2023-150584OB-C21 and PID2023-150584OB-C22, funded by MCIN/AEI/10.13039/501100011033, and by the European Union “NextGenerationEU/PRTR”).
Author information
Authors and Affiliations
Contributions
L.A., E.G., and M.R.M. conceived the methodology, L.A. and E.G. conducted the data collection and performed the baseline experiments, G.L.F. supervised the work and M.G.-M. wrote the original draft of the manuscript. All authors validated and formally analyzed the dataset and reviewed the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Giordani, E., Arcioni, L., Gil-Martín, M. et al. Real-world road damage dataset with potholes, cracks, and maintenance holes. Sci Rep (2026). https://doi.org/10.1038/s41598-026-46679-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-46679-4