A Large-Scale Image Repository for Automated Pavement Distress Analysis and Degradation Trend Prediction

Yang, Hanlin; Cao, Jinpu; Wan, Jun; Gao, Qian; Liu, Chenglong; Fischer, Martin; Du, Yuchuan; Li, Yishun; Jain, Pooja; Wu, Difei

doi:10.1038/s41597-025-05748-5

Download PDF

Data Descriptor
Open access
Published: 14 August 2025

A Large-Scale Image Repository for Automated Pavement Distress Analysis and Degradation Trend Prediction

Hanlin Yang¹,
Jinpu Cao²,
Jun Wan³,
Qian Gao¹,
Chenglong Liu ORCID: orcid.org/0000-0002-8421-7017¹,
Martin Fischer²,
Yuchuan Du¹,
Yishun Li¹,
Pooja Jain⁴ &
…
Difei Wu¹

Scientific Data volume 12, Article number: 1426 (2025) Cite this article

4624 Accesses
6 Citations
1 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 09 October 2025

This article has been updated

Abstract

In recent years, automated detection technologies for large-scale pavement distress have become a focal point of research in the transportation sector. With the rapid advancement of deep learning technologies, data-driven artificial intelligence recognition algorithms have gradually emerged as the industry mainstream. The effectiveness of such algorithms largely depends on the reliability and quantity of the samples. However, existing datasets exhibit significant shortcomings in terms of sample size, category diversity, and support for distress tracking. In this study, a large-scale image dataset was meticulously constructed. This dataset includes 51012 road images for pavement distress identification and 8928 images for long-term tracking of pavement distress. Using this dataset, six mature object detection algorithms were trained and evaluated, with the results demonstrating the performance of these algorithms. To the best of our knowledge, this is the first large-scale pavement distress dataset that includes long-term tracking of pavement distress, providing reliable data support for dynamic tracking and monitoring of pavement distress as well as for optimizing road maintenance strategies.

Leveraging a deep learning generative model to enhance recognition of minor asphalt defects

Article Open access 21 November 2024

An annotated street view image dataset for automated road damage detection

Article Open access 22 April 2024

Surrogate measurement of traffic safety with insights from pavement condition, sensor fusion, and machine learning

Article Open access 09 December 2025

Background & Summary

Pavement performance evaluation is a prerequisite for road maintenance and has a significant impact on driving safety and user comfort. Pavement distresses such as cracks, potholes, and alligatoring are inevitable throughout the lifespan of the pavement¹. Timely and reliable pavement distress detection effectively improves the timeliness and targeting of maintenance activities, preventing further deterioration of pavement performance. Automatic pavement distress detection has always been one of the most popular research topics in the field. Many advanced technologies and algorithms have been invented to quickly identify and locate pavement distresses. Regular distress inspections can assess the current state of the pavement condition, typically conducted monthly or annually. However, due to insufficient measurement cycles, tracking the deterioration trends of specific distresses becomes challenging. Meanwhile, traditional methods rely on specialized detection equipment and vehicles, which to some extent limits the coverage and update frequency of detection data. Since pavement performance is easily affected by traffic loads and weather conditions, high-frequency and large-scale pavement distress collection is necessary for maintenance planning and decision-making.

Pavement distress detection technology has evolved from manual surveys to semi-automation and finally to full automation. Based on the data collection method, the mainstream pavement distress detection technologies can be categorized into two types: (a) laser scanning techniques and (b) RGB image-based techniques. Other detection methods, such as those based on vibration or infrared technology, are only implemented in specific scenarios². Traditional Laser Road Imaging Systems (LRIS) use lasers and line-scan cameras to illuminate the pavement³. 3D measurement vehicles capable of obtaining dense point cloud data and high-resolution images containing more information (elevation and intensity) than RGB images can detect precise distresses with a precision of 1 mm⁴. Ghosh and Smadi⁵ applied deep convolutional neural networks to automatically locate and identify various distresses from high-resolution 3D images. Inzerillo et al.⁶ applied Structure from Motion (SfM) technology using dense point cloud data collected by Unmanned Aerial Vehicles (UAVs) to reconstruct the 3D morphology of pavement distresses. However, limited by the resolution of UAV data, this method is only suitable for the detection and reconstruction of larger and more obvious distresses. Laser-based detection technologies provide dense and precise 3D information for pavement distresses, and various advanced algorithms have been developed for automatic identification of distress types and calculation of their attributes. However, the high cost of laser vehicles and the substantial computational requirements for data processing limit their application in large-scale and high-frequency pavement distress inspections.

Compared to laser-based methods, RGB image-based pavement distress detection has practical advantages due to its low cost and simple installation. Extensive research and technology have been conducted on pavement distress collection, which can be classified into three types: (i) manual inspection, (ii) semi-automatic methods, and (iii) automatic inspection. Manual inspection primarily involves personnel subjectively identifying and assessing the site or photographed images. This method is easy to implement but requires a significant amount of labor and is inefficient. For semi-automatic methods, pavement distress images are first processed by computer vision algorithms in software, followed by manual adjustment and marking through human intervention⁷. With the rapid development of artificial intelligence, automatic detection based on RGB images has become the most commonly used method. Traditional texture analysis methods, such as wavelet decomposition^8,9, edge detection¹⁰, etc., have been widely used in pavement distress detection. These methods are stable and effective. However, due to the various forms of pavement distress and complex testing environments, texture-based algorithms exhibit low robustness and accuracy in different scenarios¹¹. Emerging deep learning algorithms have provided new insights for pavement distress detection. Du et al.¹² applied a YOLO pre-trained convolutional neural network to automatically identify and classify pavement distresses, achieving an accuracy rate of 73.64%. A significant amount of research has proven that applying deep learning to pavement distress detection yields excellent performance.

The main contributions of this paper are as follows:

1.
We first propose a large-scale pavement distress dataset comprising 59940 images, covering diverse road environments across multiple cities. Figure 1 illustrates examples of pavement distress annotations from our dataset. Table 1 provides a comparative summary highlighting the advantages of the proposed dataset over others. We validate the dataset using multiple object detection algorithms and provide baseline results.
Fig. 1
Full size image
Examples from PaveTrack_OD from different countries.
Table 1 Comparison of the proposed dataset with other datasets.
Full size table
2.
We introduce a high-frequency tracking dataset focusing on three main types of pavement distresses. This is the first large-scale pavement tracking dataset, encompassing continuous monitoring of 165 road locations over a period of approximately six months. Figure 2 demonstrates examples of continuous tracking of pavement distress.
Fig. 2
Full size image
An example for distress continuous tracking from PaveTrack_PD.
3.
We present a benchmark pavement distress tracking algorithm framework designed to achieve matching and tracking of pavement distresses.

Methods

The data preparation process comprises four critical stages, as illustrated in Fig. 3: data acquisition, data cleansing, data desensitization, and data annotation. The dataset intended for object detection was primarily collected from relevant cities such as Shanghai, China; Menlo Park and Palo Alto in the San Francisco Bay Area, USA. Specialized software was employed to capture and annotate road images, aiming to identify and document the pavement distress conditions depicted in the images. The operational procedures for these two phases are described in detail below.

Data collection

The data for this study was acquired using different methods tailored for various countries. For data acquisition in the Chinese region, a mobile vehicle equipped with industrial camera was employed to capture road surface images. As illustrated in Fig. 4(a), the industrial camera was installed according to a specific layout. For data acquisition within the United States, a mobile vehicle was equipped with dual commercial imaging devices (GoPro 13) to record the condition of the road surface. As illustrated in Fig. 4(b), the imaging apparatus was configured with specific geometric considerations. The first device was placed at an oblique angle relative to the horizontal plane, facilitating the capture of a comprehensive view of the surrounding road surface. Concurrently, the second imaging device was affixed to an extension apparatus with its optical axis approximately perpendicular to the horizontal plane, enabling detailed documentation of the proximate road surface immediately posterior to the vehicle. This dual-camera configuration was designed to simultaneously document both the macroscopic spatial distribution of pavement distress and the microscopic morphological characteristics of localized deterioration features.

Data cleaning

During the data cleansing phase, to comprehensively and accurately capture widely distributed distress features, we rigorously screened the image dataset by eliminating: (1) images devoid of defect information, and (2) images with poor imaging quality caused by adverse weather conditions (including camera lens condensation due to heavy rainfall and ineffective feature acquisition in high-illumination environments), sensor malfunctions, or improper capture rates. This ensured that subsequent analysis was conducted on the basis of high-quality image data.

Data desensitization

The data desensitization is divided into two parts: firstly, drawing inspiration from De Kerf’s concept¹³, we employed selective blurring techniques to anonymize identifiable information such as traffic signs, license plates, and human faces (as shown in Fig. 5) while ensuring data validity, thereby effectively protecting privacy. Secondly, in the pavement distress tracking data section, we utilized a clustering algorithm to merge images with proximate geographical coordinates into a single folder, thereby anonymizing the GPS data of the images to facilitate subsequent analysis and tracking tasks.

Data annotation

The dataset in this study consists of two components: the first component is the annotation of pavement distress, wherein all identified instances of distress are annotated by enclosing them with bounding boxes and classified by appending the appropriate category labels. The second component involves long-term photographic tracking of certain pavement distresses, accompanied by refined segmentation to evaluate changes in the morphology and area of the distresses.

Data Records

The dataset has been made publicly available via Science Databank¹⁴ and can be accessed at https://doi.org/10.57760/sciencedb.20383. It is divided into PaveTrack_OD for pavement distress detection and PaveTrack_PD for pavement distress tracking and image matching. The specifics of each dataset are illustrated in Fig. 6. For PaveTrack_OD, it is pre-divided into training, testing, and validation subsets at a 70/15/15 percentage split, facilitating immediate use in machine learning workflows. Due to uneven numbers of defect types, we used stratified sampling to maintain similar defect ratios across all dataset splits. Annotation data is provided in YOLO format. As for PaveTrack_PD, we utilize Excel to track objects in each image, employing xywh to annotate the areas where pavement distress is located, alongside providing detailed mask segmentations for assessing distress changes.

Dataset statistics

In Table 2, we present the basic statistical data of the first part of the dataset, along with clear and concise definitions for each category. For the Chinese data, we categorize it into 10 key categories (as illustrated in Fig. 1(a)): including six traditional distress types—crack, patched crack, pothole, patched pothole, alligator cracks, and patched alligator cracks—along with four non-traditional pavement distresses: manhole, street waste, speed bump, and puddle. For the American dataset, we categorize road defects into six distinct classes (as illustrated in Fig. 1(b)): crack, patched crack, pothole, patched pothole, clay-patched crack, and manhole. The clay-patched crack category specifically refers to cracks that have been repaired using clay rather than conventional asphalt or concrete materials, which serves to enhance the diversity of patched crack samples in our dataset. This comprehensive overview is crucial for understanding the scope and distribution of our data.

Table 2 Overview of the first segment of the dataset categorized by type.

Full size table

The statistics of the number of distresses instances in the dataset are calculated by counting the number of pavement distress occurrences in each image, as shown in Fig. 7. Most images contain only a small number of distresses, while the image with the highest count may contain up to 29 distress instances. The results indicate that there is a significant variation in the number of pavement distress across different roads.

For the second part of the dataset, our focus is on the variations in transverse cracks, alligator cracks, and potholes. In Shanghai, China, we have gathered data on the cracking conditions at 165 locations, which include 40 instances of transverse cracks, 57 instances of alligator cracks, and 68 instances of potholes. The specific distribution of these conditions is depicted in Fig. 8. It can be observed that during the tracking period, potholes are more likely to receive maintenance, whereas transverse cracks are less likely to be maintained. Additionally, pavement damages located on principal arterials are more likely to garner attention from the maintenance departments.

Technical Validation

Our dataset was meticulously planned through the deployment of logistics vehicles equipped with cameras, ensuring the capture of a variety of images under different environmental conditions and locations. These varying environmental conditions are primarily related to weather, especially the position and intensity of the sun. Changes in sunlight intensity, angle, and shadows alter the appearance of objects in images, affecting contrast, color, and texture. This dataset encompasses these diverse conditions, enabling it to represent real-life scenarios. To protect privacy while preserving the dataset’s efficacy for pavement distress detection, specific alterations have been implemented. This includes the use of advanced repair techniques and selective blurring to effectively anonymize identifiable features, ensuring that the resulting images remain highly relevant for analytical purposes.

Performance of object detection algorithms using PaveTrack_OD

Our dataset is employed for training and assessing the performance of seven predominant object detection algorithms, including Faster-RCNN¹⁵, YOLOv5¹⁶, YOLOv8¹⁷, YOLOX¹⁸, YOLOv11¹⁹, and RT-DETRv2²⁰. The performance of the models is evaluated using four critical metrics: precision, recall, mAP50, and FLOPS. These metrics collectively provide a multifaceted view of the model’s accuracy and efficiency in correctly identifying and classifying each category within the dataset. For the dataset collected in China, RT-DETRv2 showed excellent detection performance with mAP50 of 0.593. As shown in Table 3, YOLOv11 follows closely with an mAP50 of 0.569. These results underscore the efficacy of our dataset in conjunction with deep learning algorithms for the detection of pavement distresses.

Table 3 Performance Comparison of Object Detection Methods (China Dataset).

Full size table

For the dataset collected in the United States, the models’ performances differ from those observed in the Chinese dataset due to variations in volume and distribution. As evidenced in Table 4, YOLOv8 achieves a mAP50 of 0.561, significantly outperforming other models. Faster-RCNN, despite being proposed several years ago as a classical object detection method, maintains the highest precision of 0.782 in this dataset.

Table 4 Performance Comparison of Object Detection Methods (United States Dataset).

Full size table

Baseline methods and results for pavement distress matching and tracking using PaveTrack_PD

For the second part of the dataset, we designed a three-step matching algorithm to filter a large number of pavement distresses, with the specific design as follows:

Step 1 (GPS Clustering): This step involves collecting images from the same location. To overcome the potential offset in GPS data due to tall buildings, this study employs an improved K-means algorithm that incorporates a filtering mechanism for outlier data during the clustering process. This enhanced algorithm is better equipped to handle outliers in GPS data, thereby improving the accuracy and efficiency of clustering. By filtering out outliers, the algorithm can more accurately determine cluster centers, resulting in clustering results that more closely reflect actual conditions. Considering the positioning errors of GPS in urban environments with building obstacles and multi-lane roads, images within a range of 5 to 20 meters are clustered to facilitate matching at each location.

Step 2 (Background Matching): To accurately identify and match images that are close in GPS coordinates but have visual content differences, we need to match different scenes based on the background features of the images. The SuperPoint algorithm is first used to detect keypoints and extract descriptors for the images to be matched. The extracted keypoints and descriptors are then input into the SuperGlue algorithm for matching. Through SuperGlue’s graph attention mechanism, the similarity between keypoints is learned, establishing reliable matching correspondences. Based on the matching results, the similarity between two images is evaluated to achieve scene matching.

Step 3 (Adjacent Local Area Matching): The SuperGlue network provides pixel-wise matches between two images, while the image recognition algorithm draws bounding boxes for each image. If the pixels within two bounding boxes match, there is a duplicate pavement distress in both images. However, due to different shooting angles, unremarkable features, or unusual weather conditions, the same distress in two images may not share matching features. Repaired defects and potholes are easily matched due to their more pronounced features, but the original defects are not prominent throughout the image and are difficult to match directly using SuperGlue. To address this, we designed an algorithm that matches specific pavement distresses by comparing adjacent local areas within the images. Specifically, we can extract local areas around the distress and then use a feature matching algorithm to compare the similarity of these regions. By calculating the relative position and orientation of the local areas, we can determine whether the two distresses are the same. This method helps us accurately match defects in two images, even if their shapes and sizes differ.

Pavement performance degradation can also be observed through a designed matching framework, which illustrates five common degradation scenarios.

Case 1 (Fig. 9) illustrates the onset of pavement distress, where, within a span of 20 days, the initially smooth pavement developed cracks. This demonstrates that distress does not accumulate gradually but occurs abruptly.

Case 2 (Fig. 10) depicts a scenario where no significant deterioration occurred in the pavement. Over a four-month monitoring period, the crack at this location remained in its initial state.

Case 3 (Fig. 11) showcases an instance of crack propagation, likely due to the combined effects of water infiltration and heavy vehicular loading. After four months, the length of the crack expanded to nearly double its original size.

Case 4 (Fig. 12) presents a situation where a pothole was repaired. If the pothole becomes a repaired defect, it typically indicates that the road maintenance unit has intervened. This approach enables us to track changes in pavement damage status at high frequency, optimizing maintenance timing.

Case 5 (Fig. 13) demonstrates the progression of a crack into a pothole, indicating that pavement distress is undergoing deterioration.

Annotation validation

The annotation process was a collaborative effort among team members, with images fairly distributed among the initial annotators. Dr. Liu Chenglong played a pivotal role in reviewing and refining these annotations to ensure accuracy. Following annotation, the images were anonymized and further processed using inpainting techniques to maintain the integrity of the dataset without compromising network performance.

Influence of the inpainting process

To evaluate the impact of our anonymization techniques on model performance, we conducted a comparative analysis. Initially, the model was trained using the original dataset without anonymization. The accuracy of the model was validated against different validation sets to establish a performance baseline. Subsequently, the same model was evaluated using the anonymized dataset, ensuring consistency in testing conditions. Notably, the comparative analysis revealed no significant differences in validation results between the anonymized and non-anonymized datasets. This finding indicates that our inpainting-based anonymization process does not impair the model’s ability to detect pavement distress, confirming the integrity of our data processing methodology.

Code availability

The data generated for this study is independent of any encoding process. In order to ensure reproducibility and facilitate further research, the frameworks employed for the training and testing of object detection algorithms are MMDetection, PaddleDetection and Ultralytics, provided by https://github.com/open-mmlab/mmdetection, https://github.com/PaddlePaddle/PaddleDetection and https://github.com/ultralytics/ultralytics, respectively.

Change history

09 October 2025
A Correction to this paper has been published: https://doi.org/10.1038/s41597-025-06081-7

References

Cafiso, S., D’Agostino, C., Delfino, E. & Montella, A. From manual to automatic pavement distress detection and classification. In 2017 5th IEEE International Conference on Models and Technologies for Intelligent Transportation Systems (MT-ITS) (pp. 433–438). IEEE (2017).
Coenen, T. B. J. & Golroo, A. A review on automated pavement distress detection methods. Cogent Engineering 4(1), 1374822 (2017).
Article Google Scholar
Yu, X. & Salari, E. Pavement pothole detection and severity measurement using laser imaging. In 2011 IEEE International Conference on Electro/Information Technology (pp. 1–5). IEEE (2011).
Tsai, Y. J. & Wang, Z. Development of an asphalt pavement raveling detection algorithm using emerging 3D laser technology and macrotexture analysis. Technical Report (2015).
Ghosh, R. & Smadi, O. Automated detection and classification of pavement distresses using 3D pavement surface images and deep learning. Transportation Research Record 2675(9), 1359–1374 (2021).
Article Google Scholar
Inzerillo, L., Di Mino, G. & Roberts, R. Image-based 3D reconstruction using traditional and UAV datasets for analysis of road pavement distress. Automation in Construction 96, 457–469 (2018).
Article Google Scholar
McGhee, K. H. Automated pavement distress collection techniques (Vol. 334). Transportation Research Board (2004).
Wang, K. C. P., Li, Q. & Gong, W. Wavelet-based pavement distress image edge detection with a trous algorithm. Transportation Research Record 2024(1), 73–81 (2007).
Article MathSciNet Google Scholar
Zhou, J., Huang, P. S. & Chiang, F. P. Wavelet-based pavement distress detection and evaluation. Optical Engineering 45(2), 027007 (2006).
Article ADS Google Scholar
Tsai, Y. C., Kaul, V. & Mersereau, R. M. Critical assessment of pavement distress segmentation methods. Journal of Transportation Engineering 136(1), 11–19 (2010).
Article Google Scholar
Du, Z., Yuan, J., Xiao, F. & Hettiarachchi, C. Application of image technology on pavement distress detection: A review. Measurement 184, 109900 (2021).
Article Google Scholar
Du, Y. et al. Pavement distress detection and classification based on YOLO network. International Journal of Pavement Engineering 22(13), 1659–1672 (2021).
Article Google Scholar
De Kerf, T., Sels, S., Samsonova, S. & Vanlanduit, S. A dataset of drone-captured, segmented images for oil spill detection in port environments. Scientific Data 11(1), 1180 (2024).
Article PubMed PubMed Central Google Scholar
Yang, H. et al. PaveTrack: A Large-Scale Image Repository for Automated Pavement Distress Analysis and Degradation Trend Prediction[DS/OL]. V3. Science Data Bank, [2025-07-24]. https://doi.org/10.57760/sciencedb.20383 (2025).
Ren, S., He, K., Girshick, R. & Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Transactions on Pattern Analysis and Machine Intelligence 39(6), 1137–1149 (2016).
Article ADS PubMed Google Scholar
Jocher, G. et al. ultralytics/yolov5: v7.0-yolov5 sota realtime instance segmentation. Zenodo (2022).
Varghese, R. & Sambath, M. YOLOv8: A novel object detection algorithm with enhanced performance and robustness. In 2024 International Conference on Advances in Data Engineering and Intelligent Computing Systems (ADICS) (pp. 1–6). IEEE (2024).
Ge, Z. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430 (2021).
Khanam, R. & Hussain, M. Yolov11: An overview of the key architectural enhancements. arXiv preprint arXiv: 2410.17725 (2024).
Lv, W. et al. Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140 (2024).
Shi, Y., Cui, L., Qi, Z., Meng, F. & Chen, Z. Automatic road crack detection using random structured forests. IEEE Transactions on Intelligent Transportation Systems 17(12), 3434–3445 (2016).
Article Google Scholar
Cui, L., Qi, Z., Chen, Z., Meng, F. & Shi, Y. Pavement distress detection using random decision forests. In Data Science: Second International Conference, ICDS 2015 (pp. 95-102). Springer (2015).
Zou, Q. et al. Deepcrack: Learning hierarchical convolutional features for crack detection. IEEE Transactions on Image Processing 28(3), 1498–1512 (2018).
Article ADS MathSciNet Google Scholar
Yang, F. et al. Feature pyramid and hierarchical boosting network for pavement crack detection. IEEE Transactions on Intelligent Transportation Systems 21(4), 1525–1535 (2019).
Article Google Scholar
Xu, Z. et al. Pavement crack detection from CCD images with a locally enhanced transformer network. International Journal of Applied Earth Observation and Geoinformation 110, 102825 (2022).
Article Google Scholar
Guo, F., Qian, Y., Liu, J. & Yu, H. Pavement crack detection based on transformer network. Automation in Construction 145, 104646 (2023).
Article Google Scholar
Stricker, R., Eisenbach, M., Sesselmann, M., Debes, K. & Gross, H. M. Improving visual road condition assessment by extensive experiments on the extended gaps dataset. In 2019 International Joint Conference on Neural Networks (IJCNN) (pp. 1–8). IEEE (2019).
Maeda, H., Sekimoto, Y., Seto, T., Kashiyama, T. & Omata, H. Road damage detection and classification using deep neural networks with smartphone images. Computer-Aided Civil and Infrastructure Engineering 33(12), 1127–1141 (2018).
Article Google Scholar
Maeda, H., Kashiyama, T., Sekimoto, Y., Seto, T. & Omata, H. Generative adversarial network for road damage detection. Computer-Aided Civil and Infrastructure Engineering 36(1), 47–60 (2021).
Article Google Scholar
Arya, D., Maeda, H., Ghosh, S. K., Toshniwal, D. & Sekimoto, Y. RDD2020: An annotated image dataset for automatic road damage detection using deep learning. Data in Brief 36, 107133 (2021).
Article CAS PubMed PubMed Central Google Scholar
Arya, D., Maeda, H., Ghosh, S. K., Toshniwal, D. & Sekimoto, Y. RDD2022: A multi-national image dataset for automatic road damage detection. Geoscience Data Journal 11(4), 846–862 (2024).
ADS Google Scholar
Zhang, H. et al. A new road damage detection baseline with attention learning. Applied Sciences 12(15), 7594 (2022).
Article CAS Google Scholar
Ren, M., Zhang, X., Zhi, X., Wei, Y. & Feng, Z. An annotated street view image dataset for automated road damage detection. Scientific Data 11(1), 407 (2024).
Article PubMed PubMed Central Google Scholar
KAYA, Ö. & Çodur, M. Y. N-RDD2024: Road damage and defects. Mendeley Data, 4 (2024).

Download references

Acknowledgements

This work was funded by research grants from the National Natural Science Foundation of China (52472327), the National Key Research and Development Program of China (SQ2023YFB2600166) and Shanghai Science and Technology Innovation Action Plan (23QB1404900).

Author contuibutions

The authors confirm contribution to the paper as follows: study conception and design: H.Y., C.L., Y.D.; collection of data and implementation of the experiments: H.Y., Y.L., J.C., D.W., M.F., P.J.; analysis and interpretation of results: H.Y., C.L., J.C., W.J., M.F., P.J.; draft manuscript preparation: H.Y., J.C. All authors reviewed the results and approved the final version of the manuscript.

Author information

Authors and Affiliations

Key Laboratory of Road and Traffic Engineering of the Ministry of Education, Tongji University, 4800 Cao’an Road, Shanghai, 201804, P. R. of China
Hanlin Yang, Qian Gao, Chenglong Liu, Yuchuan Du, Yishun Li & Difei Wu
Civil and Environmental Engineering, Stanford University, 473 Via Ortega, Stanford, 94305, USA
Jinpu Cao & Martin Fischer
School of Transportation Engineering, Chang’an University, Shangyuan Road, Xi’an, 710018, P. R. of China
Jun Wan
The Williams Sale Partnership Ltd, Seattle, Washington, 98154, USA
Pooja Jain

Authors

Hanlin Yang
View author publications
Search author on:PubMed Google Scholar
Jinpu Cao
View author publications
Search author on:PubMed Google Scholar
Jun Wan
View author publications
Search author on:PubMed Google Scholar
Qian Gao
View author publications
Search author on:PubMed Google Scholar
Chenglong Liu
View author publications
Search author on:PubMed Google Scholar
Martin Fischer
View author publications
Search author on:PubMed Google Scholar
Yuchuan Du
View author publications
Search author on:PubMed Google Scholar
Yishun Li
View author publications
Search author on:PubMed Google Scholar
Pooja Jain
View author publications
Search author on:PubMed Google Scholar
Difei Wu
View author publications
Search author on:PubMed Google Scholar

Corresponding authors

Correspondence to Chenglong Liu or Yuchuan Du.

Ethics declarations

Competing interests

The authors declare that they have no competing interests or personal relationships that could have influenced the work reported in this paper.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, H., Cao, J., Wan, J. et al. A Large-Scale Image Repository for Automated Pavement Distress Analysis and Degradation Trend Prediction. Sci Data 12, 1426 (2025). https://doi.org/10.1038/s41597-025-05748-5

Download citation

Received: 21 March 2025
Accepted: 01 August 2025
Published: 14 August 2025
Version of record: 14 August 2025
DOI: https://doi.org/10.1038/s41597-025-05748-5