Background & Summary

Much sewage and wastewater are being released into natural water bodies, resulting in water scarcity and environmental issues such as eutrophication, excessive metal contamination and plastic pollution1,2,3. Sewage outfalls (SOs), found extensively on both sides of the river, are specific channels for releasing pollutants from various sources of pollution into the water bodies4. Many managers recognize the importance of locating, obstructing, and regulating SOs to protect the natural water bodies5,6. High-resolution images are suitable for the analysis and interpretation of SOs. Currently, the interpretation must be performed by individuals with specialised environmental science knowledge. The over-reliance on professionals has several disadvantages, including the time-consuming and labour-intensive, which has impeded the widespread identification of SOs in large-scale river basins7,8.

Deep learning has advanced visual technology, enabling computers to acquire the expertise of image interpreters and become intelligent tools capable of detecting SOs objectives in large-scale river basins9. A high-resolution image set of SOs that can be used for advanced model training, validation, and testing is a necessary prerequisite for monitoring SOs to benefit from deep learning10,11. Therefore, a high-quality SOs image dataset is significant for engineers, scientists, and managers in SOs identification. However, there is no satisfactory dataset of SOs.

Several conducted research can serve as references for creating image datasets for SOs objective detection, such as those by Xu et al.12 and Huang et al.10. Xu et al. used the UAVs, which capture high-resolution images of SOs by operating at low altitudes. However, these photos are only about 600, which makes it challenging to meet the requirements of deep learning12. Huang et al. operated UAVs at a significant elevation and obtained around 7000 images of SOs10. Nevertheless, several disadvantages of these images lead to a diminished level of accuracy in identifying SOs, including (i) long-distance shooting leads to the size of the SOs in the field of view being too small to be identified; (ii) vertical photography makes it easy to ignore the sewage outlet with a small protruding amplitude. In recent years, China’s official administration has conducted a comprehensive survey of SOs to determine their exact spatial locations and capture images. This effort has accumulated a significant number of pictures. Nevertheless, these photos are unrefined and devoid of annotations, rendering them challenging to utilize directly for deep learning models. Fortunately, this study thoroughly examines these materials and establishes a standard SOs dataset. Moreover, the YOLOv10 series, one of the state-of-the-art target detection models, is used to evaluate the performance of the SOs dataset in this study13.

The main contribution of this study is the development of a high-quality dataset named the images for sewage outfalls objective detection (iSOOD) for the first time. The construction of iSOOD is determined by the following criteria14:

  1. (i)

    Diversity. Images are acquired in diverse geographical locations and lighting situations, encompassing various kinds of SOs;

  2. (ii)

    Accuracy. Our research team has completed and repeatedly checked the annotation work to ensure accuracy.

  3. (iii)

    Consistency. The annotation of the images follows the standard YOLOv10 format15;

  4. (iv)

    Extensibility. The images match specific attribute information, such as the category.

The iSOOD dataset consists of 10481 images and 10481 records of specific attribute information. The purpose is to encourage researchers to create advanced deep learning models using this iSOOD dataset and collaborate with us. Our mission is to promote the implementation of advanced detection technologies for SOs globally to enhance the intelligent management capabilities of river basins.

Methods

Figure 1 illustrates the essential steps involved in the development of iSOOD datasets.

Fig. 1
figure 1

Schematic overall of the iSOOD datasets creation and technical validation.

Data collection

21246 images and 24466 attributes were acquired from original field investigations conducted in China in the Yangtze River basin and the Yellow River basin. Field investigations were conducted in the same regions using UAVs and handheld cameras. As a result, hidden and frequently unnoticed SOs were identified.

Image processing and attribute creation

Following acquiring original data, this study eliminates redundant, low-resolution images. Furthermore, annotation guidelines for iSOOD datasets were established16. This study used the Autodistill tool package in the Roboflow platform for labelling work17. This technique automatically applies pre-annotations to the SOs. The preliminary annotation findings allow our researchers to efficiently prioritize the SOs in the photos and make necessary modifications to the unsatisfactory annotations. To guarantee the exceptional quality of iSOOD, this study implemented a multi-level quality inspection approach. Each image must undergo review by at least two independent groups of researchers. Furthermore, the potential labelling mistakes are addressed through regular reviews. When confronted with ambiguous or contentious annotations, the researchers would engage in thorough discussion and ultimately reach a consensus. Finally, the first generation of the iSOOD dataset has been released, consisting of 10481 images of high quality, together with corresponding attributes.

The factors leading to the differences and changes in the quantity of images and attributes are as follows. (i) The differences between 21246 in the initial image files and 24466 in the original attributes arise due to numerous SOs in certain images. It is worth noting that this study merged the attributes to achieve a one-to-one correspondence between the SOs and the attributes. This ensures that the iSOOD dataset has both a single SO and multiple SOs images. (ii) The changes between the original SOs and the final SOs arise because some images are mistaken for the SOs, such as drinking water pipes and blurred images.

Technical validation

The iSOOD with 10481 images was split into a training set (80%), test set (10%), and validation set (10%). These sets were utilized to train the YOLOv10 series models, and the technical verification was reported based on the obtained performance.

Data Records

The iSOOD is freely shared via the Zenodo platform18. The iSOOD dataset consists of an image dataset in YOLO format accompanied by annotation files and attribute information in Excel format. Each row represents a single record of a sewage outfall at a specific location. The columns in the dataset are as follows:

  1. 1.

    Image_name: Corresponds to the image file name in the dataset (sequentially numbered starting with 1, such as 1. jpg, 2. jpg).

  2. 2.

    Outfall_code: Every outfall has a unique code.

  3. 3.

    basin: River basin affiliation (1 = Yangtze River, 2 = Yellow River).

  4. 4.

    typ: Type of sewage outfalls. (1 = Combined sewer, 2 = Rainwater, 3 = Industrial wastewater, 4 = Agricultural drainage, 5 = Livestock breeding, 6 = Aquaculture, 7 = Surface runoff, 8 = Wastewater treatment plant, 9 = Domestic wastewater (e.g., wastewater not collected by wastewater treatment plants), 10 = Other).

Statistics and examples of the dataset

The iSOOD dataset has a total of 10481 SOs images in YOLO format, which were collected from the Yangtze River (9285 SOs images) and the Yellow River (1196 SOs images) in China (Fig. 2). The iSOOD dataset contains around ten types of SOs, as shown in Fig. 6. These SOs images in the iSOOD dataset have a range of pixel distribution (Fig. 3). Approximately 95.1% of the images in the iSOOD dataset have high pixels (Fig. 7). Figure 4 displays the heatmap illustrating the annotation box centre distribution of the iSOOD dataset. 77.3% of the images depict SOs located near the centre of the picture, while the remaining 22.7% show SOs around the edge area (Fig. 8). The number of large-sized SOs object images with pixel values greater than 96*96 is the largest, accounting for 80.0% of the iSOOD dataset. The second largest is the medium-sized SOs targets (accounting for 18.6%), and the smallest is the small-sized SOs targets with pixel values less than 32*32 (accounting for 1.4%) (Fig. 5). The original size of the annotation boxes and SOs is shown in Fig. 9.

Fig. 2
figure 2

Categories of sewage outfalls in the iSOOD dataset.

Fig. 3
figure 3

Pixel distribution histogram of images.

Fig. 4
figure 4

Heatmap of Annotation Box Center Distribution.

Fig. 5
figure 5

Heatmap of Annotation Box Distribution.

Fig. 6
figure 6

Example of sewage outfall classification.

Fig. 7
figure 7

Example of sewage outfalls in different resolutions. High resolution refers to pixel values higher than 1280*1280; low resolution refers to pixel values lower than 1280*1280.

Fig. 8
figure 8

Examples of annotation distribution bounding boxed across sewage outfalls in the pictures. Near the centre means.25 \(\le \) x_center \(\le \) 0.75 and 0.25 \(\le \) y_center \(\le \) 0.75; Others means the close to the edge.

Fig. 9
figure 9

Example of sewage outfalls in different sizes. Small size refers to pixels below 32*32; medium size refers to the range from 32*32 to 96*96; and large size is above 96*96.

Technical Validation

Environment settings

Given the potential application scenario of using the UAVs for real-time detection of SOs, the deep learning model must have the characteristics of rapid target recognition and compatibility with low-version hardware ports. Therefore, the YOLOv10 series, one of the state-of-the-art target detection models, was used to evaluate the performance of the iSOOD dataset. Compared with the previous YOLOv series, the YOLOv10 series has faster speed and higher accuracy13. The iSOOD dataset, comprising 10481 images of SOs, was randomly split into a training set (80%), test set (10%) and validation set (10%)19. The training of the YOLOv10 series utilized a personal computer with RTX 3090 24GB GPU, employing the default hyper-parameters. The batch sizes for the series were set to 16. The training epochs for the YOLOv10 series were set to 100. Before training, the iSOOD images were randomly translated, flipped, and scaled.

Evaluation metrics

This study evaluates the efficacy of integrating the iSOOD dataset with the YOLOv10 series. We utilized the pycocotools to extract the average precision (AP) and the average recall (AR) metrics20. The AP metric evaluates a model’s capacity to accurately identify relevant objects by quantifying the proportion of actual positive detection. The AR metric evaluates the model’s ability to detect all relevant cases by measuring the percentage of actual positive detection among all relevant ground truths. The metrics evaluate the model’s ability by comparing the bounding boxes identified by the model with the annotation bounding boxes of the SOs21. The higher the values of AP and AR, the more satisfactory evaluation outcomes. The Intersection over the Union (IoU) threshold is a critical parameter that significantly impacts the evaluation results. The AP@50:5:95 is calculated at different IoU thresholds, typically from 0.5 to 0.95, with a step size of 0.05. The AP50 and AP75 are calculated with IoU values set to 0.50 and 0.75, respectively22. Furthermore, the study assessed the detection performance of images of various sizes of SOs[]. The AR01, AR10, and AR100 represent the average AR of 1, 10, and 100 maximum number detection objectives for all IoU thresholds.

Performance evaluation

Table 1 shows the performance evaluation for the YOLOv10 series at different IoU thresholds. The training, validation and testing performances demonstrated that there was no over-fitting. On the test dataset, the AP and AR metrics are 0.626~0.883 and 0.597~0.785, which are better than the results of previous studies10,12. These results indicate that the iSOOD dataset is suitable for developing a deep learning model to utilise SOs objective detection in natural environments effectively. Table 2 shows the performance evaluation for the YOLOv10 series at different sizes of SOs objectives. On the test dataset, the AP and AR metrics for small-size SOs objectives range are 0.078~0.196 and 0.236~0.336, indicating a relatively low accuracy in identifying small-sized objectives. This is not related to the limitation of the models because the feature information contained in small-sized SOs images is very sparse. To ensure the precision of SOs detection in practical applications, one effective method is to operate the UAVs close to the river bank to take high-resolution images of SOs23.

Table 1 Performance evaluation for YOLOv10 series at different IoU thresholds.
Table 2 Performance evaluation for YOLOv10 series at different sizes of sewage outfalls objectives.

Usage Notes

This study presents the first fine-grained dataset for SOs objective detection in natural environments. The iSOOD includes 10481 images captured by UAVs and handheld cameras. Our researchers meticulously annotated the iSOOD to assign labels to SOs. The iSOOD have been publicly released after desensitization to promote interdisciplinary collaboration and accelerate advancements in intelligence watershed management. We expect the iSOOD dataset to inspire further research on the SOs detection and the control of pollution migration paths and serve as a fundamental resource for using advanced deep learning visual technology in environmental monitoring.

Implications of the iSOOD for intelligence watershed management

The importance of SOs inspection in improving the water ecological environment has gradually attracted the recognition of policymakers. There is a significant demand for iSOOD and related technology in watershed management. For example, the Chinese administration is initiating an in-depth investigation into SOs across the country. China has allocated billions of dollars and employed tens of thousands of knowledgeable employees only in the Yangtze River and Yellow River basins’ upper and middle sections. Nevertheless, the ongoing investigation of the SOs constitutes only about 10% of the overall effort. Internationally, countries can also examine the “China model” to investigate the SOs within the river basin to guarantee the water’s ecological safety. The extensive application scenarios mean that iSOOD and related intelligence technologies have great potential to replace manual labour in SOs detection, significantly reducing costs and enhancing efficiency.

The most important recommendation is to implement artificial intelligence technologies related to iSOOD on the UAV platform for watershed management. More precisely, the specific details are as follows. (i) There is a requirement for UAVs that can operate at low altitudes and be easily navigated, along with flight control algorithms that are compatible with these platforms. This study found that the precision of identifying minor SOs is relatively low. To cope with this challenge, a reliable UAV platform is required to acquire high-resolution images of SOs within its range of vision using automated cruse and near-up flights. (ii) The YOLO series algorithm architecture is the primary focus of application in artificial intelligence for automatically identifying SOs. Object detection techniques can be classified into two-stage and one-stage algorithm methods. As a one-stage algorithm, the YOLO series offers the benefit of rapid processing, making it particularly well-suited for real-time surveillance24,25. Nevertheless, compared to the standard two-stage approach, YOLO also has the drawback of reduced detection accuracy26,27,28. Hence, it is imperative to conduct further research to enhance the detection speed and accuracy of algorithms built upon YOLO. (iii) The iSOOD dataset could be able to continuously gather and accumulate images to enhance its performance in tasks associated with SOs detection worldwide.