Background & Summary

Road surface type impacts driving comfort1,2,3, traffic safety4, and socio-economic development5,6. For example, improvements in road surface types have been shown to significantly boost the rural economic transformation in the Java-Bali region5. Some studies have also found a positive correlation between road condition improvements and GDP growth in sub-Saharan Africa6. Road surface type has also been included by the United Nations as one of the indicators for assessing the progress of the Sustainable Development Goals (SDGs). Specifically, SDG 9.1.1 refers to the “proportion of rural population living within 2 kilometers of a road that is passable all year-round”7,8,9. Generally, paved roads ensure year-round accessibility, while unpaved roads may be subject to seasonal disruptions due to adverse weather conditions8. Therefore, identifying road surface types is crucial for improving transportation infrastructure9.

Various techniques have been developed for identifying road surface types. For instance, Tai et al. proposed a motorcycle-based pavement evaluation system that utilizes a smartphone’s triaxial accelerometer, employing both supervised and unsupervised machine learning methods to detect road anomalies and assess pavement quality10. Similarly, Takahashi et al. developed a bicycle-based road condition monitoring system, which classifies road surface signals captured by smartphones into categories such as damage or flatness11. Menegazzo et al. developed a CNN-based deep neural network model that uses accelerometer and gyroscope data from different vehicles, drivers, and environments to classify road surfaces into categories such as asphalt and dirt roads12. Lee et al. introduced a real-time pavement classification algorithm that uses deep neural networks and triaxial accelerometers to distinguish between asphalt, gravel, and unpaved roads13. Al-Mansour also developed a low-cost pavement maintenance management system, using inexpensive sensors embedded in smartphones to classify road surfaces as “Good,” “Fair,” “Poor,” or “Very Poor”14.

Remote sensing technologies have also been widely used to identify road surface types. Dobson et al. used aerial images collected from remote-controlled helicopters or manned fixed-wing aircraft to create three-dimensional models of road segments, detecting potholes and cracks in the pavement15. Brooks et al. introduced an object-based classification algorithm that uses optical aerial imagery with a resolution of 0.3 meters to classify road surfaces as either paved or unpaved16. Thiede et al. used high-resolution remote sensing imagery (0.5 meters) to identify unpaved roads in South Africa’s Gauteng and Northwest provinces17. Brewer et al. employed a convolutional neural network (CNN) algorithm to predict road surface types using 0.3-meter resolution remote sensing images18. Recently, Zhou et al. proposed a method combining OpenStreetMap data with high-resolution satellite imagery, leveraging the VGG-16 model to identify paved and unpaved roads. They also developed the first dataset of road surface types in Kenya, Africa19.

Despite these advancements, there are still some limitations in existing studies. First, using mobile devices mounted on vehicles to identify road surface types is a time-consuming and labor-intensive task, as this method can only collect road surface data for individual roads. While high-resolution remote sensing data (e.g., with a resolution greater than 1 meter) can be used to identify road surface types over large-scale road networks, such as at the national or regional level, vegetation in densely covered areas may obstruct the view, reducing the accuracy of surface type identification.

In recent years, open street view data (e.g., from Mapillary20 and OpenStreetCam21) has become increasingly available. These crowdsourced street views, contributed by volunteers from different countries and regions22,23, have proven to be valuable for identifying road surface types. Thanks to their open access and global coverage, crowdsourced street view data is considered one of the most important data sources for this task. For example, Lee et al. proposed an AdapNet-based semantic segmentation model that can segment road pixels from Mapillary crowdsourced street view and classify them as either paved or unpaved roads24. Despite this progress, deep learning models for road surface type identification require large training datasets. To the best of our knowledge, there is currently a lack of publicly available datasets specifically designed for identifying road surface types from crowdsourced street views.

Therefore, this study aims to develop the first comprehensive Mapillary crowdsourced street view dataset that includes a large number of samples labeled with road surface types. The main contributions of this study are as follows:

  1. 1)

    The first dataset containing 200,000 Mapillary street images has been developed, all of which were collected from Africa. These images are classified into three categories: ‘Paved,’ ‘Unpaved,’ and ‘Unknown.’

  2. 2)

    This study demonstrates the effectiveness of deep learning models trained on the aforementioned dataset. It also shows that the models can accurately classify road surface types in six sub-regions of Africa, with classification metrics all exceeding 0.95.

This study focuses on the African region due to the generally low rate of road paving compared to other parts of the world25. For example, an existing study has reported that 70% of roads in sub-Saharan Africa remain unpaved26, resulting in poor transportation and inefficient trade, which in turn hinders local socio-economic development. Therefore, our classified street view dataset may provide valuable data support for improving transportation infrastructure in Africa.

Methods

Overview

This study developed a dataset consisting of 200,000 street view images classified by different road surface types. The main steps involved are as follows:

Firstly, Africa was divided into grids of a certain size, and Mapillary street view images along with their location data were randomly selected from each grid as candidates. Then, road data for Africa was obtained and matched with the street view data, filtering out those street views that corresponded to roads and included road surface type labels. Next, street view images with different road surface types were randomly and evenly sampled from the candidate set, and visually reclassified to produce the final dataset. Finally, deep learning models were trained based on the street view labeled dataset, and the trained models were applied to different sub-regions of Africa to validate the effectiveness and usability of the dataset. The workflow is shown in Fig. 1.

Fig. 1
figure 1

The workflow of our approach.

Data collection

Mapillary street view

In this study, 3,440 grids of 100 km × 100 km were created across Africa. The Mapillary API (https://www.mapillary.com/developer/api-documentation) was then used to randomly retrieve street view images from each grid along with their corresponding location information (represented by latitude and longitude coordinates), resulting in a total of 907,963 street view images.

The main advantages of using Mapillary street view data include: 1) The data is freely accessible. 2) In addition to street view images, Mapillary provides location information for each image, which facilitates the visualization of the spatial distribution of road surface types. 3) Mapillary data has global coverage20, making it suitable for large-scale studies at national and regional levels. In contrast, Google Street View data is only available in certain countries or regions.

Road data

This study utilized OpenStreetMap (OSM) road data, specifically for roads that are accessible to motor vehicles (as shown in Table 1). The data can be obtained from the third-party platform at (http://download.geofabrik.de/index.html). OSM data has the advantages of global coverage and open accessibility27,28. More importantly, OSM road data includes tags (e.g., surface = paved) that record the road surface type, providing a reference for classifying road surface types in Mapillary street view images.

Table 1 Road classifications based on four-wheeled vehicle passability.

Data processing

Data matching

In theory, all Mapillary street view images could be used to create the dataset. However, some images may not have been taken on roads, and even for those captured on roadways, the corresponding OSM road data may either lack surface type tags or contain erroneous entries due to mislabeling (e.g., surface = ???, yes, or 7387.92 m2). Therefore, the OSM road data were first filtered to retain only road segments with clearly defined surface type tags. Then, for each retrieved street view image, the distance to the nearest road was calculated. If the distance was less than a certain threshold (10 meters), the street view image was considered to be on a road. Otherwise, the image was considered to be off-road. We set the threshold at 10 meters because existing studies suggest that the positioning accuracy of most mobile phones is around 10 meters29. As a result, 709,463 Mapillary street view images were obtained from Africa, all of which were associated with roads that included surface type labels.

Data sampling

From the matched 709,463 Mapillary street view images, we randomly and evenly sampled images representing various road surface types. After reviewing these images, the followings were observed: (1) Paved roads in Africa are predominantly asphalt, while unpaved roads are mostly dirt roads; (2) Some street view images, due to low resolution or because they were not taken on roads, could not be visually interpreted for their surface type. Therefore, these street view images were classified into three categories: “Paved,” “Unpaved,” and “Unknown.” The specific steps are as follows:

First, the tags in the OSM road data were reclassified into two categories: “Paved” and “Unpaved” (as shown in Table 2). Then, 100,000 street view images were randomly selected from the matched “Paved” roads and 100,000 images from the matched “Unpaved” roads, for a total of 200,000 images.

Table 2 Reclassification of OSM road surface tags.

Next, recognizing that surface type tags in the OSM road data may contain errors, three experts with backgrounds in GIS and transportation were assigned to visually interpret and classify the 200,000 extracted street view images into three categories: “Paved,” “Unpaved,” and “Unknown.” To validate the accuracy of visual interpretation, the Fleiss’ Kappa coefficient was employed for inter-rater consistency assessment30. The calculated value of 0.995 indicates a high level of agreement among the three experts. Lastly, the final classification label for each street view image was determined using a majority voting mechanism. Table 3 and Fig. 2 present the number of street view images in each category after visual classification, along with representative examples of each category.

Table 3 Distribution of Mapillary Street View Images in Each Category After Visual Interpretation.
Fig. 2
figure 2

Examples to illustrate the classification (including ‘Paved’, ‘Unpaved’ and ‘Unknown’) of street views (Data source: https://www.mapillary.com).

Data usability evaluation

Although the accuracy of our dataset has been verified using the Fleiss’ Kappa coefficient, it is necessary to adjust the usability of this dataset. Therefore, deep learning models were further employed to assess the usability of this dataset for road surface classification. As a result, this study considers two scenarios:

  1. 1)

    Scenario 1: Training different deep learning models using the developed dataset of 200,000 street view images and validating the model’s performance. Several classic deep learning models were employed, including VGG-1631, ResNet-5032, Swin Transformer33, Yolo v734, and ConvNeXt35. These models have been widely used in image recognition, object detection, and other fields, and generally outperform traditional machine learning models, such as decision trees and random forests, in classification tasks36,37,38,39,40,41. Besides, existing studies have shown that these models can perform better than some traditional convolutional neural network models (e.g., AlexNet and GoogleNet) on datasets like ImageNet42,43.

    For each model, we randomly sampled 5,500 images per class (‘Paved’, ‘Unpaved’, ‘Unknown’), yielding 16,500 total images. These were split into 80% training and 20% validation sets. All models were implemented in PyTorch and trained on a workstation equipped with an NVIDIA P100 GPU and 16 GB of RAM. After extensive tuning, we set the learning rate to 0.0001, batch size to 64, and number of epochs to 100. Model performance was then evaluated using accuracy, precision, recall and F1-score44,45.

  2. 2)

    Scenario 2: Applying the trained models to sub-regions to validate their usability.

To further validate the usability of the street view dataset, the optimal model from Scenario 1 was applied to six 10 × 10 km road network sub-regions to identify the road surface types in each sub-region (as shown in Fig. 3). For each sub-region, all Mapillary street view images and their corresponding location data were first retrieved. Then, the optimal model from Scenario 1 was used to classify the road surface types of all street view images within each sub-region. The classification accuracy for each sub-region was validated using metrics such as accuracy, precision, recall, and F1-score.

Fig. 3
figure 3

Locations of the six tested road networks in Africa and the distribution of roads covered by Mapillary street view data in each region.

The selection of these six sub-regions is based on their location in different countries and regions across Africa, covering both urban (Fig. 3a,c,d) and rural (Fig. 3b,e,f) road network patterns. This selection helps minimize subjectivity that may arise from focusing on a single type of study area. More importantly, these sub-regions provide a relatively rich set of Mapillary street view images.

Data Records

This dataset includes Mapillary street view images and corresponding location data from Africa46. The street view images are saved in JPG format, with each image assigned a unique ID number, totaling 200,000 images. The location data is represented as point vector data in Esri Shapefile format, where each point includes: a unique street view ID, longitude, latitude, and road surface type (as shown in Table 4), with a total of 200,000 points. All data are projected using the World Geodetic System (WGS) 84 and the pseudo-Mercator coordinate system (EPSG: 3857) (Fig. 4). The dataset is publicly accessible on Figshare, which is released under a CC-BY 4.0 license. Both the dataset and metadata can be retrieved from the corresponding link46. Additionally, all data were checked against their descriptions to ensure consistency and accuracy. This verification was performed to ensure that the data aligns with the provided metadata and classification details.

Table 4 The fields of our dataset.
Fig. 4
figure 4

The spatial distribution of street views in our dataset (categorized into three types: ‘Paved,’ ‘Unpaved,’ and ‘Unknown’) across different regions.

Technical Validation

Results evaluation

Results and analyses of Scenario 1

Table 5 presents a comparison of the accuracy in road surface type classification of Mapillary street view data using four different models.

Table 5 Comparison of accuracy across different models.

As shown in Table 5, the accuracy, precision, and recall for identifying various road surface types from Mapillary street view images using different models all exceed 0.9, with F1 scores above 0.91. This indicates that the models trained on the developed Mapillary street view dataset perform well overall. The consistently high performance across multiple models demonstrates the usability of the developed dataset. The highest accuracy was achieved using the Swin Transformer model, with a value of 0.924, slightly outperforming the other four models (ResNet-50, VGG-16, Yolo v7, and ConvNeXt).

Results and analyses of Scenario 2

The results from Scenario 1 indicate that the Swin Transformer model achieved the highest accuracy. Therefore, in Scenario 2, the Swin Transformer was applied to classify all street view images from the six sub-regions. The classification accuracy for each sub-region is shown in Table 6. Figure 5 presents the classification results of road surface types for each sub-region.

Table 6 Evaluation results of applying the best model to the six tested regions (I, II, III, IV, V, VI).
Fig. 5
figure 5

The spatial patterns of road surface classification for the six tested 10 × 10 km study areas: (a) I; (b) II; (c) III; (d) IV; (e) V; (f) VI. The roads that have not been covered with any Mapillary street view data are represented with thin gray lines.

As shown in Table 6, when applying the Swin Transformer model trained in Scenario 1 to the road networks of six sub-regions in Africa, the accuracy, precision, recall, and F1 scores generally exceed 0.95. This suggests that the deep learning models trained on the street view dataset developed in this study can be applied to other regions of Africa. Notably, the performance metrics for Sub-region VI even exceed 0.99, likely because this sub-region contains only unpaved roads (Fig. 5f), leading to fewer classification errors.

Limitations

This study has some limitations as follows:

  1. 1)

    Model selection: Firstly, only five classic deep learning models were selected for this study because existing studies have shown that these models perform better than some traditional convolutional neural network models (e.g., AlexNet and GoogleNet) on datasets like ImageNet42,43. Our experiments also confirmed that the accuracy, precision, and recall using these models all exceed 0.9. However, as artificial intelligence is continuously evolving and new models are emerging, future research could explore whether incorporating modules such as attention mechanisms or introducing other models can further improve the classification accuracy of road surface types.

  2. 2)

    Road surface type categories: Secondly, this study classified road surface types into just three categories: “Paved,” “Unpaved,” and “Unknown.” In reality, road surface types may be more diverse, including asphalt, concrete, and dirt roads, among others. However, by analyzing the 200,000 street view images, we observed that paved roads in Africa are predominantly asphalt, while unpaved roads are mostly dirt roads, with very few other types of surfaces. Therefore, this dataset can be effectively used for road surface type classification of Mapillary street view images in most regions of Africa. Besides, the “Unknown” category in our dataset could assist users in identifying and filtering low-quality street view data. Nonetheless, road surface types may differ across countries and regions. Thus, future research could consider adding street view images of other road surface types to enhance the dataset’s applicability.

  3. 3)

    Geographical regions: Thirdly, the street view data used in this study were all sourced from Africa. Since road surface types can vary significantly across regions (such as in Asia or Europe), the models trained on this dataset may not be directly applicable to other regions of the world. Nevertheless, the methodology developed in this study could be applied to create street view datasets for road surface type classification in other regions globally.

  4. 4)

    Mapillary data: The Mapillary street view data has inherent limitations, including gaps in coverage, particularly in rural or remote areas, as well as variability in image quality due to factors such as resolution, lighting, and capture angle. Furthermore, the data tends to be biased toward urban areas where street view images are more frequently captured, which may result in certain road types being underrepresented47. Nevertheless, the crowdsourced nature of Mapillary allows for continuous updates, which could improve data quality and expand coverage in future studies.

Despite these limitations, our study offers several valuable contributions. First, a generalizable methodology for creating road surface type datasets from crowdsourced street view imagery has been proposed. By leveraging Mapillary data, this approach can be adapted to other regions worldwide and extended to additional platforms such as Google Street View48 and OpenStreetCam21, thereby broadening its applicability. Second, a comprehensive dataset of 200,000 African street view images labeled by road surface type has been developed. These images serve as rich training samples for deep learning models, facilitating automated classification of road surfaces across diverse African environments. Third, the dataset can support assessment of Sustainable Development Goal 9.1.1, which measures the proportion of the rural population living within 2 km of all-season roads. National-and regional-scale road surface maps generated from this dataset (as shown in Fig. 5) can provide crucial indicators for monitoring progress toward this goal. Last but not least, the dataset can be employed to evaluate and enhance OpenStreetMap (OSM) road surface tags. By comparing “Paved” and “Unpaved” labels with existing OSM tags, inconsistencies can be identified. Furthermore, deep learning models trained on these labeled images can automatically infer and supplement missing or erroneous surface tags in OSM, improving overall data quality.