Background & Summary

Pavement performance evaluation is a prerequisite for road maintenance and has a significant impact on driving safety and user comfort. Pavement distresses such as cracks, potholes, and alligatoring are inevitable throughout the lifespan of the pavement1. Timely and reliable pavement distress detection effectively improves the timeliness and targeting of maintenance activities, preventing further deterioration of pavement performance. Automatic pavement distress detection has always been one of the most popular research topics in the field. Many advanced technologies and algorithms have been invented to quickly identify and locate pavement distresses. Regular distress inspections can assess the current state of the pavement condition, typically conducted monthly or annually. However, due to insufficient measurement cycles, tracking the deterioration trends of specific distresses becomes challenging. Meanwhile, traditional methods rely on specialized detection equipment and vehicles, which to some extent limits the coverage and update frequency of detection data. Since pavement performance is easily affected by traffic loads and weather conditions, high-frequency and large-scale pavement distress collection is necessary for maintenance planning and decision-making.

Pavement distress detection technology has evolved from manual surveys to semi-automation and finally to full automation. Based on the data collection method, the mainstream pavement distress detection technologies can be categorized into two types: (a) laser scanning techniques and (b) RGB image-based techniques. Other detection methods, such as those based on vibration or infrared technology, are only implemented in specific scenarios2. Traditional Laser Road Imaging Systems (LRIS) use lasers and line-scan cameras to illuminate the pavement3. 3D measurement vehicles capable of obtaining dense point cloud data and high-resolution images containing more information (elevation and intensity) than RGB images can detect precise distresses with a precision of 1 mm4. Ghosh and Smadi5 applied deep convolutional neural networks to automatically locate and identify various distresses from high-resolution 3D images. Inzerillo et al.6 applied Structure from Motion (SfM) technology using dense point cloud data collected by Unmanned Aerial Vehicles (UAVs) to reconstruct the 3D morphology of pavement distresses. However, limited by the resolution of UAV data, this method is only suitable for the detection and reconstruction of larger and more obvious distresses. Laser-based detection technologies provide dense and precise 3D information for pavement distresses, and various advanced algorithms have been developed for automatic identification of distress types and calculation of their attributes. However, the high cost of laser vehicles and the substantial computational requirements for data processing limit their application in large-scale and high-frequency pavement distress inspections.

Compared to laser-based methods, RGB image-based pavement distress detection has practical advantages due to its low cost and simple installation. Extensive research and technology have been conducted on pavement distress collection, which can be classified into three types: (i) manual inspection, (ii) semi-automatic methods, and (iii) automatic inspection. Manual inspection primarily involves personnel subjectively identifying and assessing the site or photographed images. This method is easy to implement but requires a significant amount of labor and is inefficient. For semi-automatic methods, pavement distress images are first processed by computer vision algorithms in software, followed by manual adjustment and marking through human intervention7. With the rapid development of artificial intelligence, automatic detection based on RGB images has become the most commonly used method. Traditional texture analysis methods, such as wavelet decomposition8,9, edge detection10, etc., have been widely used in pavement distress detection. These methods are stable and effective. However, due to the various forms of pavement distress and complex testing environments, texture-based algorithms exhibit low robustness and accuracy in different scenarios11. Emerging deep learning algorithms have provided new insights for pavement distress detection. Du et al.12 applied a YOLO pre-trained convolutional neural network to automatically identify and classify pavement distresses, achieving an accuracy rate of 73.64%. A significant amount of research has proven that applying deep learning to pavement distress detection yields excellent performance.

The main contributions of this paper are as follows:

  1. 1.

    We first propose a large-scale pavement distress dataset comprising 59940 images, covering diverse road environments across multiple cities. Figure 1 illustrates examples of pavement distress annotations from our dataset. Table 1 provides a comparative summary highlighting the advantages of the proposed dataset over others. We validate the dataset using multiple object detection algorithms and provide baseline results.

    Fig. 1
    Fig. 1
    Full size image

    Examples from PaveTrack_OD from different countries.

    Table 1 Comparison of the proposed dataset with other datasets.
  2. 2.

    We introduce a high-frequency tracking dataset focusing on three main types of pavement distresses. This is the first large-scale pavement tracking dataset, encompassing continuous monitoring of 165 road locations over a period of approximately six months. Figure 2 demonstrates examples of continuous tracking of pavement distress.

    Fig. 2
    Fig. 2
    Full size image

    An example for distress continuous tracking from PaveTrack_PD.

  3. 3.

    We present a benchmark pavement distress tracking algorithm framework designed to achieve matching and tracking of pavement distresses.

Methods

The data preparation process comprises four critical stages, as illustrated in Fig. 3: data acquisition, data cleansing, data desensitization, and data annotation. The dataset intended for object detection was primarily collected from relevant cities such as Shanghai, China; Menlo Park and Palo Alto in the San Francisco Bay Area, USA. Specialized software was employed to capture and annotate road images, aiming to identify and document the pavement distress conditions depicted in the images. The operational procedures for these two phases are described in detail below.

Fig. 3
Fig. 3
Full size image

Flowchart of PaveTrack generation process.

Data collection

The data for this study was acquired using different methods tailored for various countries. For data acquisition in the Chinese region, a mobile vehicle equipped with industrial camera was employed to capture road surface images. As illustrated in Fig. 4(a), the industrial camera was installed according to a specific layout. For data acquisition within the United States, a mobile vehicle was equipped with dual commercial imaging devices (GoPro 13) to record the condition of the road surface. As illustrated in Fig. 4(b), the imaging apparatus was configured with specific geometric considerations. The first device was placed at an oblique angle relative to the horizontal plane, facilitating the capture of a comprehensive view of the surrounding road surface. Concurrently, the second imaging device was affixed to an extension apparatus with its optical axis approximately perpendicular to the horizontal plane, enabling detailed documentation of the proximate road surface immediately posterior to the vehicle. This dual-camera configuration was designed to simultaneously document both the macroscopic spatial distribution of pavement distress and the microscopic morphological characteristics of localized deterioration features.

Fig. 4
Fig. 4
Full size image

Data collection method from different countries.

Data cleaning

During the data cleansing phase, to comprehensively and accurately capture widely distributed distress features, we rigorously screened the image dataset by eliminating: (1) images devoid of defect information, and (2) images with poor imaging quality caused by adverse weather conditions (including camera lens condensation due to heavy rainfall and ineffective feature acquisition in high-illumination environments), sensor malfunctions, or improper capture rates. This ensured that subsequent analysis was conducted on the basis of high-quality image data.

Data desensitization

The data desensitization is divided into two parts: firstly, drawing inspiration from De Kerf’s concept13, we employed selective blurring techniques to anonymize identifiable information such as traffic signs, license plates, and human faces (as shown in Fig. 5) while ensuring data validity, thereby effectively protecting privacy. Secondly, in the pavement distress tracking data section, we utilized a clustering algorithm to merge images with proximate geographical coordinates into a single folder, thereby anonymizing the GPS data of the images to facilitate subsequent analysis and tracking tasks.

Fig. 5
Fig. 5
Full size image

Data processing for privacy protection (traffic signs, license plates, facial recognition).

Data annotation

The dataset in this study consists of two components: the first component is the annotation of pavement distress, wherein all identified instances of distress are annotated by enclosing them with bounding boxes and classified by appending the appropriate category labels. The second component involves long-term photographic tracking of certain pavement distresses, accompanied by refined segmentation to evaluate changes in the morphology and area of the distresses.

Data Records

The dataset has been made publicly available via Science Databank14 and can be accessed at https://doi.org/10.57760/sciencedb.20383. It is divided into PaveTrack_OD for pavement distress detection and PaveTrack_PD for pavement distress tracking and image matching. The specifics of each dataset are illustrated in Fig. 6. For PaveTrack_OD, it is pre-divided into training, testing, and validation subsets at a 70/15/15 percentage split, facilitating immediate use in machine learning workflows. Due to uneven numbers of defect types, we used stratified sampling to maintain similar defect ratios across all dataset splits. Annotation data is provided in YOLO format. As for PaveTrack_PD, we utilize Excel to track objects in each image, employing xywh to annotate the areas where pavement distress is located, alongside providing detailed mask segmentations for assessing distress changes.

Fig. 6
Fig. 6
Full size image

Directory structure of PaveTrack.

Dataset statistics

In Table 2, we present the basic statistical data of the first part of the dataset, along with clear and concise definitions for each category. For the Chinese data, we categorize it into 10 key categories (as illustrated in Fig. 1(a)): including six traditional distress types—crack, patched crack, pothole, patched pothole, alligator cracks, and patched alligator cracks—along with four non-traditional pavement distresses: manhole, street waste, speed bump, and puddle. For the American dataset, we categorize road defects into six distinct classes (as illustrated in Fig. 1(b)): crack, patched crack, pothole, patched pothole, clay-patched crack, and manhole. The clay-patched crack category specifically refers to cracks that have been repaired using clay rather than conventional asphalt or concrete materials, which serves to enhance the diversity of patched crack samples in our dataset. This comprehensive overview is crucial for understanding the scope and distribution of our data.

Table 2 Overview of the first segment of the dataset categorized by type.

The statistics of the number of distresses instances in the dataset are calculated by counting the number of pavement distress occurrences in each image, as shown in Fig. 7. Most images contain only a small number of distresses, while the image with the highest count may contain up to 29 distress instances. The results indicate that there is a significant variation in the number of pavement distress across different roads.

Fig. 7
Fig. 7
Full size image

Distribution of the number of distresses per image.

For the second part of the dataset, our focus is on the variations in transverse cracks, alligator cracks, and potholes. In Shanghai, China, we have gathered data on the cracking conditions at 165 locations, which include 40 instances of transverse cracks, 57 instances of alligator cracks, and 68 instances of potholes. The specific distribution of these conditions is depicted in Fig. 8. It can be observed that during the tracking period, potholes are more likely to receive maintenance, whereas transverse cracks are less likely to be maintained. Additionally, pavement damages located on principal arterials are more likely to garner attention from the maintenance departments.

Fig. 8
Fig. 8
Full size image

Statistics on the number of tracking instances included in PaveTrack_PD.

Technical Validation

Our dataset was meticulously planned through the deployment of logistics vehicles equipped with cameras, ensuring the capture of a variety of images under different environmental conditions and locations. These varying environmental conditions are primarily related to weather, especially the position and intensity of the sun. Changes in sunlight intensity, angle, and shadows alter the appearance of objects in images, affecting contrast, color, and texture. This dataset encompasses these diverse conditions, enabling it to represent real-life scenarios. To protect privacy while preserving the dataset’s efficacy for pavement distress detection, specific alterations have been implemented. This includes the use of advanced repair techniques and selective blurring to effectively anonymize identifiable features, ensuring that the resulting images remain highly relevant for analytical purposes.

Performance of object detection algorithms using PaveTrack_OD

Our dataset is employed for training and assessing the performance of seven predominant object detection algorithms, including Faster-RCNN15, YOLOv516, YOLOv817, YOLOX18, YOLOv1119, and RT-DETRv220. The performance of the models is evaluated using four critical metrics: precision, recall, mAP50, and FLOPS. These metrics collectively provide a multifaceted view of the model’s accuracy and efficiency in correctly identifying and classifying each category within the dataset. For the dataset collected in China, RT-DETRv2 showed excellent detection performance with mAP50 of 0.593. As shown in Table 3, YOLOv11 follows closely with an mAP50 of 0.569. These results underscore the efficacy of our dataset in conjunction with deep learning algorithms for the detection of pavement distresses.

Table 3 Performance Comparison of Object Detection Methods (China Dataset).

For the dataset collected in the United States, the models’ performances differ from those observed in the Chinese dataset due to variations in volume and distribution. As evidenced in Table 4, YOLOv8 achieves a mAP50 of 0.561, significantly outperforming other models. Faster-RCNN, despite being proposed several years ago as a classical object detection method, maintains the highest precision of 0.782 in this dataset.

Table 4 Performance Comparison of Object Detection Methods (United States Dataset).

Baseline methods and results for pavement distress matching and tracking using PaveTrack_PD

For the second part of the dataset, we designed a three-step matching algorithm to filter a large number of pavement distresses, with the specific design as follows:

Step 1 (GPS Clustering): This step involves collecting images from the same location. To overcome the potential offset in GPS data due to tall buildings, this study employs an improved K-means algorithm that incorporates a filtering mechanism for outlier data during the clustering process. This enhanced algorithm is better equipped to handle outliers in GPS data, thereby improving the accuracy and efficiency of clustering. By filtering out outliers, the algorithm can more accurately determine cluster centers, resulting in clustering results that more closely reflect actual conditions. Considering the positioning errors of GPS in urban environments with building obstacles and multi-lane roads, images within a range of 5 to 20 meters are clustered to facilitate matching at each location.

Step 2 (Background Matching): To accurately identify and match images that are close in GPS coordinates but have visual content differences, we need to match different scenes based on the background features of the images. The SuperPoint algorithm is first used to detect keypoints and extract descriptors for the images to be matched. The extracted keypoints and descriptors are then input into the SuperGlue algorithm for matching. Through SuperGlue’s graph attention mechanism, the similarity between keypoints is learned, establishing reliable matching correspondences. Based on the matching results, the similarity between two images is evaluated to achieve scene matching.

Step 3 (Adjacent Local Area Matching): The SuperGlue network provides pixel-wise matches between two images, while the image recognition algorithm draws bounding boxes for each image. If the pixels within two bounding boxes match, there is a duplicate pavement distress in both images. However, due to different shooting angles, unremarkable features, or unusual weather conditions, the same distress in two images may not share matching features. Repaired defects and potholes are easily matched due to their more pronounced features, but the original defects are not prominent throughout the image and are difficult to match directly using SuperGlue. To address this, we designed an algorithm that matches specific pavement distresses by comparing adjacent local areas within the images. Specifically, we can extract local areas around the distress and then use a feature matching algorithm to compare the similarity of these regions. By calculating the relative position and orientation of the local areas, we can determine whether the two distresses are the same. This method helps us accurately match defects in two images, even if their shapes and sizes differ.

Pavement performance degradation can also be observed through a designed matching framework, which illustrates five common degradation scenarios.

Case 1 (Fig. 9) illustrates the onset of pavement distress, where, within a span of 20 days, the initially smooth pavement developed cracks. This demonstrates that distress does not accumulate gradually but occurs abruptly.

Fig. 9
Fig. 9
Full size image

Successful pavement distress matching case 1.

Case 2 (Fig. 10) depicts a scenario where no significant deterioration occurred in the pavement. Over a four-month monitoring period, the crack at this location remained in its initial state.

Fig. 10
Fig. 10
Full size image

Successful pavement distress matching case 2.

Case 3 (Fig. 11) showcases an instance of crack propagation, likely due to the combined effects of water infiltration and heavy vehicular loading. After four months, the length of the crack expanded to nearly double its original size.

Fig. 11
Fig. 11
Full size image

Successful pavement distress matching case 3.

Case 4 (Fig. 12) presents a situation where a pothole was repaired. If the pothole becomes a repaired defect, it typically indicates that the road maintenance unit has intervened. This approach enables us to track changes in pavement damage status at high frequency, optimizing maintenance timing.

Fig. 12
Fig. 12
Full size image

Successful pavement distress matching case 4.

Case 5 (Fig. 13) demonstrates the progression of a crack into a pothole, indicating that pavement distress is undergoing deterioration.

Fig. 13
Fig. 13
Full size image

Successful pavement distress matching case 5.

Annotation validation

The annotation process was a collaborative effort among team members, with images fairly distributed among the initial annotators. Dr. Liu Chenglong played a pivotal role in reviewing and refining these annotations to ensure accuracy. Following annotation, the images were anonymized and further processed using inpainting techniques to maintain the integrity of the dataset without compromising network performance.

Influence of the inpainting process

To evaluate the impact of our anonymization techniques on model performance, we conducted a comparative analysis. Initially, the model was trained using the original dataset without anonymization. The accuracy of the model was validated against different validation sets to establish a performance baseline. Subsequently, the same model was evaluated using the anonymized dataset, ensuring consistency in testing conditions. Notably, the comparative analysis revealed no significant differences in validation results between the anonymized and non-anonymized datasets. This finding indicates that our inpainting-based anonymization process does not impair the model’s ability to detect pavement distress, confirming the integrity of our data processing methodology.