Abstract
Hydraulic infrastructures play significant roles in societal and civilization development. Traditional archaeological methods face challenges in identifying ancient dams, as most of them were damaged and blended with the natural environment. For the first time, this study focuses on the intelligent identification of ancient dams surrounding the Liangzhu Ancient City in the Hangjiahu Plain, China, utilizing historical satellite and aerial imagery of 1940s-1970s and deep learning techniques. After comparing models of Random Forest, Faster R-CNN, YOLOv5, YOLOv8 and YOLOv11, the YOLOv11 was chosen. The model was optimized by the GIoU, the Convolutional Block Attention Module and an additional detection layer to improve small target recognition and reduce misidentifications. The optimized YOLOv11 model achieved a recall rate of 68% and an precision rate of 65%, which significantly enhances speed and efficiency of ancient dam identification compared to traditional archaeological methods, providing a flexible and accurate tool for large-scale surveys of ancient water facilities.
Similar content being viewed by others
Introduction
The implementation of water management systems, designed to control, regulate and utilize water resources, has been pivotal to the advancement of human societies and the progressing of civilization. Across different periods and regions throughout human history, water management systems have consistently demonstrated their enduring importance, serving as critical infrastructure for ensuring agricultural productivity, economic stability, and societal growth. For instance, the Kosheish Dam of ancient Egypt built to protect city from flood and provide water to citizen, built from 4920 years ago1,2,3,4; the two Minoan Dams in the valley of Choiromandres (the ancient Greek hydraulic project), built around 3700–3900 years ago5,6,7; Beijing-Hangzhou Grand Canal, which was constructed from 2510 BP8; the Dujiangyan Canal, built between 2275 and 2300 BP9; the Anfeng Pond, which was created between 2615 and 2621 BP10; and the Lingqu River Canal, which was constructed in 2238 BP11. These engineering marvels highlight the sophistication of ancient water management practices. The discovery and study of ancient hydraulic systems are not only critical for understanding how early societies managed water resources but also offer invaluable insights into the origins and evolution of human civilizations, underscoring the deep interconnection between water infrastructure and cultural heritage.
Due to factors such as natural erosion, sediment deposition, and urbanization, many ancient water management systems have been obscured or destroyed, rendering their location and structures difficult to discern over time. This limitation poses a significant challenge to understanding their construction, function, and broader historical significance. Therefore, the utilization of advanced technologies like machine learning and remote sensing is crucial for the detection and analysis of these ancient hydraulic systems. These approaches not only enhance our understanding of historical water management practices but also contribute significantly to the preservation of cultural heritage.
The history of water conservancy in China could be traced back to the Liangzhu culture period, which flourished in the Taihu Lake region ~5300 to 4300 years ago12. As documented in the extant literature, the peripheral hydraulic system of the Liangzhu Ancient City encompassed 11 dams. The dams have been classified into three categories based on their location and morphology: long levees positioned in front of mountains, high dams connecting valley mouths, and low dams linking isolated hills on the plains. The application of radiocarbon (C14) dating has facilitated the corroboration of the chronology of these dams, which were constructed ~4950 years before the present era13. This discovery represents a significant advancement in the history of water conservancy in China, extending the known timeline by 2500 years. In 2019, the Archaeological Ruins of Liangzhu City were inscribed on the UNESCO World Heritage List in recognition of their status as a large city site, their advanced rice cultivation practices and their status as the earliest known example of a water conservancy system14.
Nevertheless, the discovery of these ancient water management facilities remains a challenging endeavour. In the absence of written records, the exploration and verification of Liangzhu water conservancy could only rely on traditional ground-based archaeological methods in the past. The effects of erosion and destruction over millennia have resulted in the majority of ancient dams becoming indistinguishable from their surrounding natural environment. After nearly 5000 years, these dams exhibit a profile akin to that of a mountain profile, blending seamlessly with the adjacent terrain (Fig. 1). This poses challenges for on-site identification. Extensive geophysical surveys, such as ground-penetrating radar (GPR), electrical resistivity tomography, magnetometry, and seismic refraction had been employed to detect ancient dams in Liangzhu in the past decades. However, these methods failed to reveal ancient dams, underscoring the limitation of traditional archaeological techniques in such contexts. Furthermore, the paucity of artefact within these dams presents an additional challenge to their identification through traditional archaeological methods, as they lack the typical features used in archaeostratigraphic analyses, such as pottery shards and jades. Consequently, the constraints of traditional archaeological techniques have rendered the identification of ancient water facilities a more challenging endeavour. Comparing with the ancient dams in Ancient Egypt, Liangzhu ancient dams had much larger amount1 (the Kosheish dam was 450 metres long, while the Tangshan Long Levee was 5 km), while with much older age comparing with ancient dams of Ancient Greece (the two Minoan Dams in the valley of Choiromandres was built at 3700–3900 years ago, while Liangzhu Ancient Dams was built at 4950 years BP)5. Considering the large scale and the age of the Liangzhu Ancient Dams, the limitation of traditional archaeological techniques would be more significant applied to Liangzhu case.
a–d Different ancient dams in Liangzhu.
Prior to the year 2000, the initial high dam situated 11 kilometres to the north-west of the Liangzhu Ancient City was unearthed during the course of road construction. It was not until 2009 that another dam was discovered as a result of damage caused to the road. Subsequent exploratory surveys of nearby valleys corroborated the presence of a high dam system comprising six dams13. In 2011, archaeologists employed a 1969 CORONA satellite image to facilitate the visual interpretation of ancient dams, thereby rapidly identifying the low dam reservoir area and significantly enhancing search efficiency. Subsequent investigations revealed the interconnection of the low dam system with the Long Levee and high dam system, thereby confirming the basic framework of the Liangzhu peripheral hydraulic system (11 dams). Subsequently, an interdisciplinary team comprising scholars in remote sensing, hydrology, and archaeology conducted a comprehensive survey of the Liangzhu peripheral area. This involved the utilization of remote sensing imagery with visual interpretation as the primary tool, in conjunction with borehole survey methods. This approach led to the discovery of several dams. From 2019 onwards, the integration of hydrological analysis methods led to a notable enhancement in survey efficiency, resulting in the identification of another 121 ancient dams in the vicinity and beyond the Liangzhu Ancient City, totalling 132 dams by 2024 (Fig. 2).
Overview map showing the geographic location of the study area. Liangzhu Site (Group) geographic location schematic. White dotted line is the boundaries of administrative areas. 132 confirmed ancient dams by archaeologists were labelled as yellow dots.
However, the process of identifying dams with visual interpretation via remote sensing imagery requires careful attention to the various valleys, as the majority of dam remnants appear small and exhibit insignificant colour differences (Fig. 3a–d). These remnants often resemble features such as fields, mounds and houses on historical remote sensing imagery (Fig. 3e–h), which can make them difficult to distinguish. It is challenging to confirm ancient dams through field survey and manual interpretation of remote sensing imagery, as this approach remains highly empirical and inefficient, prone to omission or misjudgment. The Liangzhu team’s comprehensive examination of the hydraulic system’s remains, made possible by over 15 years of efforts by its multidisciplinary staff in a 100-square-kilometre area, has achieved an approximate to distinguish various settlement types, and their precision was only ~50%. It is therefore evident that implementing artificial intelligence and automated approaches to enhance the rapid and accurate detection of ancient dams in remote sensing images across vast areas holds significant promise for advancing the field of hydraulic archaeology.
a Arc-shaped dam; b Saddle-shaped dam; c Short rectangular dam; d Long rectangular dam; e Houses; f–h Fields and mounds.
In the past decade, machine learning emerges as a valuable tool for processing extensive datasets from remote sensing, even though it has not been a common method in hydraulic archaeology, it enabling archaeologists to conduct comprehensive research and site prediction. For instance, Menze et al.15 identified ancient settlement mounds in the Near East using morphometric variables and a Random Forest model based on Shuttle Radar Topography Mission (SRTM) data15. De Laet et al. utilized the K-Nearest Neighbor (KNN) algorithm to automatically extract archaeological features from unexcavated sites in southwestern Turkey16. Chen et al.17 applied Principal Component Analysis/Linear Discriminant Analysis (PCA/LDA) to enhance the automatic identification of potential archaeological sites17. Lasaponara et al.18 proposed an unsupervised classification ISODATA method for identifying buried archaeological remains in Turkey18. Additionally, Soroush et al. employed Convolutional Neural Networks (CNN) to identify ancient underground irrigation canals in Iraq using historical satellite imagery19. Verschoof-van der Vaart et al.20 utilized the Region with CNN feature (R-CNN) method on LiDAR data from the Netherlands to automatically identify hollow roads20. Hence, remote sensing imagery coupled with machine learning methods offers novel avenues for automatically identifying ancient dams.
The objective of this study is to develop a machine learning-based methodology for the rapid and accurate identification of ancient dams, facilitating and promoting the development of hydraulic archaeology. Five machine learning models that are widely used were tested for the task of identifying ancient dams. This process led to the development of an optimized object detective model based on YOLOv11 integrated with Convolutional Block Attention Module. The modified model provides water archaeologists with a straightforward and practical tool for preliminary identification of ancient dams, improving the efficiency of site discovery and enabling archaeologists to reveal the spatial distribution pattern and framework of ancient dams in a holistic view and furthermore to elucidate the roles of ancient water conservancy systems in prehistorical civilizations.
Methods
Data
Unfortunately, changes in the natural environment (such as heavy rainfall and floods) for 5000 years, along with the expansion of urbanization initiated in the 1970s and various infrastructure construction activities, have resulted in an increasing number of ancient dams and other water conservancy facilities being submerged or damaged over the past decades. This is demonstrated in Fig. 4, which shows three Liangzhu Ancient Dams that were clearly visible in the 1972 CORONA image (Fig. 4a) but have since been obliterated by urban development, as seen in the contemporary remote sensing satellite imagery (Fig. 4b). This has made it increasingly difficult to confirm the location of archaeological sites. It is therefore imperative to develop precise and adaptable techniques for comprehensive, practical surveys of ancient dams, enabling prompt identification and localization of these structures and facilitating on-site archaeological verification.
a A 1972 image from the HEXAGON satellite; b a 2020 High-Resolution Satellite Image obtained from Tianditu, China. Red rectangles show the locations of ancient dams in Liangzhu.
Even the same dam, when captured at different times and by different sensors exhibits variations in its appearance across various historical remote sensing images due to changes in sun elevation and camera sensitivities (Fig. 5). Therefore, the integration of multi-temporal historical datasets presents distinct advantages for machine learning applications in archaeological prospection. The temporal depth provided by historical imagery enables baseline observations of landscape features before modern alterations, serving as ground-truth references for damaged or obliterated sites. Meanwhile, heterogeneity in image resolution, solar illumination angles, and seasonal characteristics across different acquisition periods enriches feature representation, effectively mitigating overfitting risks in deep learning models. Additionally, multi-source data fusion compensates for spatial coverage limitations of individual archives, enabling comprehensive sampling across the study area.
a A 1945 aerial image from an unspecified source, May 30 (early Summer), b a 1963 aerial image from an unspecified source, and c a 1972 image from the HEXAGON satellite, November (early winter or late Autumn). The red rectangles show the locations of ancient dams.
In this study, we employed a comprehensive dataset comprising archival aerial and satellite imagery collected from the 1940s to the 1970s. This dataset includes imagery from U-2, CORONA, and HEXAGON satellites, along with other aerial images from the corresponding era (Table 1). The geographic extent of our dataset spans ~1876 square kilometres, covering areas (from 119°40’E to 120°21’E and 30°09’N to 30°42’N)). The total imagery comprises 32.6 GB of raw data.
A series of pre-processing steps were undertaken on these historical images, including registration, geometric correction and orthorectification, radiometric corrections, image enhancement, cropping and mosaicking using Global Mapper according to the method described in Zhang et al., Galiatsatos and Philip et al.21,22,23. All aerial and satellite images were registered, geometric corrected and orthorectification based on a 1960s satellite image provided by Zhejiang Provincial Institute of Cultural Relics and Archaeology, China, which has been aligned by Zhejiang Institute of Geologic Survey. Each image had 300–2400 GCPs according to image size based on the base image and was used to perform geometric and orthorectification correction. More pre-processing details of our pre-processing method could be found in Zhang et al., Galiatsatos and Philip et al.23. Ultimately, a comprehensive set of maps encompassing the entire research area was generated, providing a fundamental framework for the identification and annotation of potential dam sites (Fig. 6).
a Unknown source, acquired in 1945; b unknown source, acquired in 1946; c U2, acquired in 1964; d CORONA, acquired in 1969.
The locations of 132 ancient dams, verified through rigorous field investigations over the past 15 years, were meticulously annotated on all registered historical remote sensing imagery. Due to historical mission footprints and the varying coverage of the imaging campaigns, only part of the images fully covered dam-bearing areas. Consequently, the 132 confirmed dams cumulatively generated 673 samples, with some dams appearing in up to five different image sets as noted in Table 1.
Generally, these samples should be divided into two groups: 80% (538 samples) were allocated for training and validation, and the remaining 20% (135 samples) for testing. This division is a standard practice in machine learning, ensuring the model is trained on a sufficiently large dataset to effectively learn underlying patterns and features. The 20% designated as the testing set is withheld during the training phase to evaluate the model based on its performance on new, previously unseen data. This evaluation accesses the model’s generalisability and helps prevent overfitting, which occurs when a model is overly tailored to the training data, and subsequently exhibits poor performance on new data. In this study, to ensure that all confirmed dams can be trained and validated, we selected a stitched image covering the entire study area and containing all confirmed dams for testing. We divided it into ~640 × 640 sizes and avoided cutting it into the dam image. A total of 530 images were segmented to form the test set, which includes 132 dam samples, approximately equivalent to 20% of the entire sample set. The remaining images were labelled and used as training and validation set.
Methodology
After pre-processing procedure, model comparison was performed to identify appropriate models for detection tasks. Optimization were conducted to improve the performance of the selected model. The flowchart of the methodology we used in this study was sketched in Fig. 7.
Flowchart of using machine learning model to predict ancient dams based on historical satellite and aerial images.
Model selection
In recent years, the most commonly utilized machine learning models in the field of archaeology have included logistic regression (LR), support vector machine (SVM), random forest, convolutional neural network (CNN), YOLO, and others. In this study, four machine learning algorithms, namely Random Forest, Faster R-CNN, YOLOv5, YOLOv8 and YOLOv11, were selected for comparison in order to assess their performance in the detection of ancient dams.
The Random Forest model, which is an ensemble of decision trees, is particularly effective for predictive modelling due to its high accuracy, robustness to outliers and scalability to large datasets. It offers versatility for both classification and regression, as well as the ability to assess the importance of features inherent to the model. However, its intrinsic complexity can impede the interpretability of the results at a granular level, and the phenomenon of overfitting can occur if the trees are not subjected to the appropriate pruning process. In summary, Random Forest offers a favourable balance between power and practicality, which contributes to its popularity24.
Faster R-CNN, which is a member of R-CNN series, employs convolutional neural networks (CNNs) for object detection. R-CNN pioneered a two-stage approach, and this approach involves first generating region proposals and then extracting CNN features for support vector machine (SVM) classification. Its principal advantage is the enhanced accuracy it offers over traditional methods. However, R-CNN is susceptible to slow processing speeds, due to the independent execution of CNN processing for each region, and the inherent complexity of its pipeline. To improve speed and accuracy, Faster R-CNN introduced the Region Proposal Network (RPN), and enabled end-to-end training and significantly improving speed and accuracy25,26,27.
The YOLO (You Only Look Once) algorithm represents a pioneering approach to object detection, offering a significant departure from traditional methods and contributing to a revolution in the field. In contrast to the use of a sliding window or region proposal-based method, YOLO treats object detection as a single regression problem, which markedly accelerates the process while maintaining high accuracy. YOLOv5 represents an advancement within the YOLO family, with a particular focus on enhancing ease of use, speed, and accuracy. The software is built on PyTorch, which provides a streamlined architecture that allows researchers and developers to rapidly train and deploy object detection models. YOLOv5 is equipped with an array of pre-trained models, each tailored to a distinct equilibrium between speed and accuracy, thereby addressing a diverse spectrum of application requirements. The modular design of the algorithm facilitates experimentation with different components and the customization of the model to specific tasks28,29. YOLOv8 represents a significant advancement in the field. The YOLOv8 model, released in 2023, exhibits enhanced accuracy, speed, and robustness compared to previous iterations. The incorporation of advanced backbone networks facilitates more efficient and effective feature extraction, thereby enabling the model to detect objects with greater precision. Furthermore, YOLOv8 provides expanded capabilities, including support for object segmentation and classification, thereby offering a versatile tool for a wide range of computer vision tasks. The optimized hyperparameters and training procedures of YOLOv8 enable it to achieve state-of-the-art performance while maintaining real-time speeds, making it an attractive choice for both research and industrial applications30. YOLOv11, the latest iteration in the YOLO series of real-time object detection models, introduces several key advancements that enhance its performance, efficiency, and versatility, making it a significant contribution to the field of computer vision31.
The Random Forest, Faster R-CNN, Yolov5, YOLOv8 and YOLOv11 models were trained on the training-validation dataset using a five-fold cross-validation method. This technique involves dividing the dataset into five subsets (or “folds”), and iteratively using one fold as the validation set and the remaining four as the training set. This process is repeated five times, with each fold serving as the validation set once. Ultimately, the performance metrics (e.g., accuracy, precision, etc.) from each iteration are averaged to provide an overall assessment of the model’s performance. This approach not only reduces the variance of the evaluation results, but also makes full use of the data. It is particularly suitable for small datasets and provides a more reliable estimate of model performance than a simple train-test split.
We conducted all experiments on the cluster of computers with suitable hardware and software setting. (hardware and software setting: Intel Xeon Gold 5118 @ 2.30 GHz CPU, 200 GB of memory, and a NVIDIA GeForce GTX 1080Ti GPU for acceleration. Anaconda was installed for environment. Input image size were ~640 × 640. Threshold is set to 0.5. For Random Forest, n_estimators = 100, min_samples_leaf = 1, max_depth = 9; for Faster R-CNN, training with Detectron2, IMS_PER_BATCH = 2, BATCH_SIZE_PER_IMAGE = 128, BASE_LR = 0.001, MAX_ITER = 2000; for YOLOv5, batch_size = 16, imgsz=640, epoch=150, weights=yolov5m.pt, optimizer = SGD, the anchor boxes were auto-adjust to fit samples, and YOLOv5 using Pytorch framework for training; for YOLOv8, batch_size = 16, imgsz = 640, epoch = 150, weights=yolov8m.pt, optimizer = SGD, the anchor boxes were auto-adjust to fit samples, and YOLOv8 using Pytorch framework for training; for YOLOv11, batch = 16, imgsz = 640, epoch = 150, weights = yolov11m.pt, optimizer = SGD, the anchor boxes were auto-adjust to fit samples, and YOLOv11 using Pytorch framework for training).
The core of a random forest is to generate multiple decision trees in parallel, with each tree using independent bootstrap sampling and feature random selection strategies during training. The generation of each decision tree is independent and parallel, rather than iteratively generating weak classifiers by gradually optimizing the loss function. Therefore, random forests do not have a global loss function that is continuously updated or tracked during the training process. In addition, the training objective of each tree is to select the optimal segmentation features and threshold, and the prediction results of the entire forest are achieved through voting or averaging, rather than adjusting parameters through the gradient direction of the loss function. Hence, only the loss curves of Faster R-CNN, YOLOv5, YOLOv8 and YOLOv11 (Fig. 8).
a Faster R-CNN; b YOLOv5; c YOLOv8 and d YOLOv11.
In this context, precision refers to the proportion of correctly identified dams among all those detected, whereas recall represents the proportion of actual positive cases where the model predicts positive cases. In Formulas (1) and (2), the term “true positive” denotes the number of authentic dams that have been accurately identified as positive cases, whereas “FP” signifies the quantity of false dams that have been erroneously classified as positive cases. Additionally, “FN” represents the number of genuine dams that have been misclassified as negative cases. Some results of five dam detection models have been shown in Fig. 9.
a Random Forest; b Faster R-CNN; c YOLOv5; d YOLOv8; e YOLOv11.
The results presented in Fig. 10 and Table 2 demonstrated that the Random Forest model exhibited the poorest performance, with a precision of 0.21 and a recall of 0.35, which could be the result of historical images has less feature of colours and textures. Similarly, the Faster R-CNN methods also demonstrated suboptimal precision and recall, with values of 0.37 and 0.35, respectively. One possible reason could be that the dataset was not large enough. In contrast, YOLO demonstrated superior performance in both metrics, with YOLOv5 exhibiting the superior performance of 0.47 for precision and 0.66 for recall, and YOLOv11 exhibiting 0.49 for precision and 0.65 for recall in comparison to YOLOv8. During the parameter tuning process, there was no significant improvement in the accuracy and recall of Random Forest and Faster R-CNN, while the YOLO models performed well, as a result, this study proceeded to construct a model based on the YOLOv11 method for its good performance and its speed.
The precision-recall curve of Faster R-CNN, YOLOv5, YOLOv8 and YOLOv11.
Model optimization
Despite its superior performance compared to other models, Yolov11 encountered difficulties in identifying features at mountain-plain intersections, where a multitude of elements, including fields, mounds, and houses, were densely concentrated. In the historical remote sensing imagery, these features and the remnants of ancient dams are of a similar size and are very small. The lower resolution and monochromatic nature of historical imagery preclude the possibility of distinguishing features based on their texture and colour. This results in confusion when interpreting historical remote sensing imagery.
In order to optimize the performance of the model, we have implemented an enhanced feature fusion network, incorporating the Convolutional Block Attention Module (CBAM) and utilizing GIoU as the loss function. Moreover, an individual prediction layer was integrated to enhance the precision of recognition for diminutive targets, thereby mitigating the confounding effects of feature overlap and augmenting the overall accuracy of recognition. The overall architecture of the optimized Yolov11-based ancient dam detection model is illustrated in Fig. 11.
The overall architecture of the Optimized Dam Detective Model based on YOLOv11.
The incorporation of GIoU (Generalized Intersection over Union) as the primary loss function within the YOLOv11 dam detection framework has been demonstrated to markedly enhance the model’s capabilities in a number of pivotal areas. Comparing IoU and GIoU in Fig. 12, three distinct types of overlapping classification exhibit identical IoU (intersection over union) values of 0.33, yet display disparate GIoU values of 0.33, 0.24, and −0.1, respectively, from left to right32. By addressing the limitations of traditional IoU loss, particularly its inability to provide meaningful gradients when the bounding boxes of the dam do not overlap, GIoU ensures that the model can continuously learn and optimize, even in scenarios where the predicted and ground truth boxes are completely disjoint. This not only accelerates the training process of dam identification but also enables the model to achieve faster convergence, as GIoU provides a more stable and consistent loss gradient throughout training. Moreover, the incorporation of GIoU in the loss function allows for the consideration of the enclosing box area surrounding both the predicted and ground truth boxes, thereby facilitating the refinement of bounding box localization and consequently enhancing the precision of dam predictions. This enhanced localization precision is of paramount importance in dam detection, where even minor discrepancies in bounding box coordinates can have a substantial impact on the overall outcome. Another advantage of utilizing GIoU in YOLOv11 is its capacity to provide a more balanced loss value range. In contrast to IoU, which can reach zero or one in extreme cases, GIoU guarantees that the loss values remain informative and useful for optimization, regardless of whether the boxes are heavily overlapping or non-overlapping. This results in a more efficient optimization process and enables the dam detective model to avoid becoming trapped in local minima32.
Three different kinds of overlapping type with same IoU = 0.33, while they have different GIoU = 0.33, 0.24, −0.1 from left to right. In other words, GIoU could more sensitive in localization accuracy and loss gradient in training32.
The Convolutional Block Attention Module (CBAM) is a lightweight yet effective attention mechanism designed to enhance the discriminative power of convolutional neural networks (CNNs) by simultaneously capturing channel-wise and spatial-wise contextual information. Unlike traditional attention modules that focus solely on either channel or spatial dimensions, CBAM adopts a sequential two-stage architecture: it first infers a channel attention map to emphasize “what” information is important, followed by a spatial attention map to highlight “where” the important information is located.
For the channel attention component, CBAM leverages both max-pooling and average-pooling operations over the spatial dimensions to aggregate global information, generating two distinct feature descriptors that capture complementary statistical properties of the input features. These descriptors are then fed into a shared multi-layer perceptron (MLP) and summed to produce a channel attention vector, which is applied to the input features via element-wise multiplication to recalibrate channel-wise importance. Subsequently, the spatial attention component takes the channel-refined features and performs max-pooling and average-pooling along the channel dimension to generate two spatial context descriptors. These descriptors are concatenated and processed through a 7 × 7 convolutional layer to produce a spatial attention map, which is applied to the features to further refine spatial relevance. By sequentially integrating channel and spatial attention, CBAM dynamically emphasizes critical features in both dimensions, enabling the network to better focus on target regions and suppress irrelevant background information (Fig. 13). This dual-dimensional attention mechanism has demonstrated significant improvements in various vision tasks, particularly in multi-object recognition, where precise localization and feature discrimination are essential for distinguishing overlapping or occluded targets33.
The CBAM enhance the discriminative power of convolutional neural networks (CNNs) by simultaneously capturing channel-wise and spatial-wise contextual information33.
The incorporation of supplementary detection layers within the YOLOv11 architectural framework has the effect of markedly enhancing its capacity for dam detection tasks. The incorporation of additional layers dedicated to the detection of ancient dams equips YOLOv11 with enhanced capabilities to address the complexities posed by the diverse shapes and intricate contextual elements inherent to dams in challenging scenarios (Fig. 14). The additional layer enables multi-scale predictions, whereby the model can utilize activation from disparate depths of the network to detect dams at varying scales34,35,36. This allows for the detection of smaller objects, which often exhibit stronger activation in earlier layers, alongside larger objects, which are better identified in deeper layers. The multi-scale approach markedly enhances the model’s accuracy, enabling it to discern and differentiate between diverse dams of varying sizes with enhanced precision. Secondly, the incorporation of additional detection layers serves to enhance the robustness of YOLOv11. In environments characterized by clutter or density, where dams may be obscured, the deployment of multiple prediction layers enhances the model’s capacity to accurately identify and differentiate between these structures. This increased robustness reduces the probability of missed detection and enhances the overall performance of the model in challenging scenarios. Moreover, the supplementary layers can facilitate a more rapid convergence during the training phase. The provision of additional feedback signals from disparate levels of the network enables the model to learn to detect dams in a more efficient manner, frequently necessitating a reduction in the quantity of data to achieve optimal performance. It is also worth noting the enhanced vocalization capabilities of YOLOv11 with additional detection layers. The model is able to refine its estimates of dam positions and sizes by predicting bounding boxes at multiple scales, thereby achieving more precise localization. Therefore, the incorporation of supplementary detection layers into YOLOv11 represents a comprehensive enhancement to its detection capabilities, resulting in increased accuracy, robustness, convergence speed, flexibility, and vocalization precision35.
a Backbone; b Neck and c Head.
Additionally, we attempted to incorporate negative samples, specifically collecting ground features such as farmland and houses that are easily confused with ancient dams, to form the training dataset together with positive samples (dam samples). The purpose of adding negative samples is to help the model more accurately learn the decision boundaries of the target category by introducing counterexample information, thereby enhancing the model’s ability to recognize positive samples. A total of 1800 negative samples were labelled in the experiment, with 600 samples each selected from fields, houses, and roads, all derived from the same historical remote sensing dataset used for labelling ancient dam samples.
This study used all positive and negative samples to train the model and evaluate its performance. The model parameters were set as follows: batch = 16, image size (imgsz) = 640, epochs = 150, weights = yolov11m.pt, optimizer = SGD. Anchor boxes were automatically adjusted to adapt to the samples, and training was conducted using the PyTorch framework.
Results
The performance of the optimized Yolov11 based model was assessed by four accuracy metrics: (1) precision, (2) recall, (3) F2-score, (4) mAP50. The precision-recall curve and mAP50 were shown in Fig. 15.
a precision-recall curve and b mAP50.
In testing set, the model successfully identified 90 dams, while 42 dams were not identified (missed detection) and 48 other objects were misidentified as dams (incorrect detection) (Table 3).
The identification results exhibited superior performance in areas with higher elevation, particularly in locations farther from the valley mouths. In these regions, nearly all dams were correctly identified. In contrast, in areas of lower elevation, especially near residential zones on the plains, false detection was more common. The dense concentration of houses and fields in these areas, which share similar visual characteristics with the features being identified, led to dams being misidentified, resulting in lower accuracy. Meanwhile, the missed detection showed a tendency to be randomly distributed.
According to Formulas 1 and 2, the values of the final precision and recall of the testing set were calculated to be 65% and 68%, respectively. F-Score is a metric that combining precision and recall to figure out the balance point that both precision and recall would be at high level. In this study, the primary purpose of the model is to quickly identify ancient dams in research area, therefore recall is a more important metric for evaluating performance than precision. Here we chose β = 2, which means recall is twice important than precision. As a result, the F2-score was 0.67, indicating that the model demonstrated a satisfactory level of performance. We used mAP50, which was mean AP at 50% IoU. AP was the area under the precision-recall curve (Table 4).
Discussion
This study employed an enhanced YOLOv11 model to achieve intelligent identification of ancient dams situated in the vicinity of the Liangzhu Ancient City and its surrounding area. While the accuracy and precision of the optimized model remain areas for improvement, this method represents a significant advancement in the rapid surveying and positioning of ancient water conservancy facilities over vast areas. It offers a notable departure from traditional field archaeology and visual interpretation of remote sensing imagery, providing a vital tool for the expeditious identification of ancient dams. The core value of our method lies in significantly expanding the capabilities of dam detection beyond what is possible with human-only efforts. Our enhanced YOLOv11 model facilitates automated detection which, while not replacing the need for field verification, considerably narrows the scope of survey targets.
Importantly, this method incorporates multidisciplinary expertise directly into the model, which allows for its application across diverse regions that share similar geomorphological characteristics. However, to maintain the integrity of our findings, on-site confirmation remains critical to ensure the accuracy of labels and to prevent the occurrence of false positives.
Furthermore, our approach significantly reduces the necessity for archaeologists to have extensive expertise in remote sensing or machine learning, thereby diminishing the time and labour demands traditionally associated with such deep technical knowledge. This is particularly beneficial in remote or topographically challenging areas where human access is limited, and in scenarios where rapid action is essential to prevent or assess damage to archaeological sites.
In regions where resources are constrained, our approach serves as a low-cost, high-efficiency solution that supports thorough site investigations and evaluations. By addressing the limitations of traditional archaeological methods and mitigating the shortage of skilled personnel, our model not only advances the field of hydraulic archaeology but also contributes to the preservation and protection of cultural heritage across the globe.
Meanwhile, the identified locations of ancient sites aid heritage management departments in precisely defining protection zones, which is crucial for preventing further damage caused by urbanization, natural disasters, or agricultural activities. This approach also provides a scientific foundation for delineating preservation boundaries, thereby contributing directly to the conservation of cultural heritage.
Despite the advancements, several challenges persist, particularly related to the imaging conditions under which historical data was captured, data augmentation.
The precision of the machine learning model in identifying ancient dams from historical satellite images is constrained by the inherent limitations of the image data itself. The low resolution of these historical images, often captured decades or even centuries ago with less advanced imaging technology, presents a significant challenge to accurately discern small-scale features like ancient dams. Furthermore, the absence of colour information serves to compound the difficulty, as colour can frequently provide vital contextual cues for the differentiation of man-made structures from their natural surroundings.
Besides, most of the CORONA and HEXAGON images and other historical images selected in the study were taken in late autumn and winter, which helps to preserve the shadow of elevated or incised features, enhancing feature detectability. Conversely, the aerial images were captured in the summer, often resulting in overexposed images which complicate the visibility of features. These variations in lighting conditions and image acquisition timing critically influence the model’s performance and the interpretability of features. Sun elevation, camera sensitivity, and the location of the feature outside the centre of the image can also impact the visibility of features37,38.
In addition to the limitations of the image data, the precision of the model is also significantly hindered by the limited number of training samples. The restricted quantity of known ancient dams constrains the model’s capacity to discern the diverse range of patterns and characteristics that these ancient dams exhibit. Consequently, the model encounters difficulties in generalizing effectively to new, unseen examples, which could lead to a reduction in accuracy.
To address these challenges, we implemented several data augmentation techniques aimed at enhancing the robustness and diversity of our training dataset to improve the model’s performance. We performed geometric augmentations, which included applying 15° rotations and ±20% scaling to 30% of our samples, specifically designed to help the model adapt to variations in orientation and size of the dam structures. We also conducted photometric augmentations, which involved adjusting the contrast by ±15% and injected historical noise characterized by a Gaussian distribution with a standard deviation of 0.05. These modifications were intended to simulate different imaging conditions and to test the model’s resilience against variations in image quality and visual clarity. However, while these methods increased the diversity of the training set, they also introduced ambiguities that confused the model. The deformations made it more challenging for the model to distinguish dams from similarly shaped natural or anthropogenic features, such as fields, houses, or other landscape elements. Moreover, these augmentation attempts led to a slight improvement in recall (71%) and a notable decrease in precision (60%). Some of ancient dams were detected as negative samples, including fields, houses and roads after applying these data augmentation techniques. Given these outcomes, the augmentation techniques were ultimately determined to be unsuitable for this context, as they introduced new complexities that did not substantially benefit the model’s performance in detecting ancient dams.
Notwithstanding the aforementioned challenges, the utilization of machine learning to discern the presence of ancient dams in historical satellite imagery represents a promising avenue of research. The methodology is not confined to the particular case of dams, but has the potential to be adapted for other ancient sites situated in comparable environments. The model’s accuracy could be further enhanced with the acquisition of additional early dam samples, thereby providing the model with a more diverse and representative training set. This would enable the model to more accurately discern the defining characteristics of ancient dams, thereby facilitating more precise and reliable rapid detection.
It is notable that the integration of deep learning algorithms with geospatial constraints has significantly improved the detection accuracy of modern dams. For instance, research by Jing et al. and Li et al. utilized YOLOv3, YOLOv5l, YOLOv5x and U-Net models to detect modern dams, enhancing precision by incorporating hydrological data, Digital Elevation Models (DEM), land use data, and the spatial distribution of roads and rivers39,40. This underscores the critical role of geospatial elements in improving detection capabilities. However, the geographical area of our study has undergone substantial fluvial geomorphological changes since the Late Pleistocene, rendering current DEM, hydrological, and land use data, as well as road and river maps, inadequate for reflecting the contemporaneous geospatial characteristics of the dams. There is a pressing need to incorporate methods for reconstructing ancient environmental evolution, paleoclimate, and paleogeography to provide geospatial constraints for the identification of ancient dams using deep learning and historical remote sensing imagery.
Moreover, attempts in regions like Iran to detect ancient qanats using historical remote sensing imagery and deep learning have shown that integrating the spatial patterns of qanats into spatial analyses can enhance both the precision and accuracy of detection41. These efforts offer new perspectives and methodologies that could be adopted in our subsequent research. Integrating paleo-geospatial variables such as topography and hydrological connectivity, along with archaeological priors like known dam distribution patterns, into deep learning architectures aims to identify ‘high-probability’ dam locations even in the absence of direct visual evidence.
Data availability
The data that support the findings of this study are available from Zhejiang Provincial Institute of Cultural Relics and Archaeology but legal restrictions apply to the availability of these data, which were used under license for the current study, and so are not publicly available. Data are however available from the authors upon reasonable request and with permission of Zhejiang Provincial Institute of Cultural Relics and Archaeology. The program is available at: https://github.com/yryb2343/Dam.
References
Biswas, A. K. History of Hydrology (American Elsevier, 1970).
Ter-Minassian, W. Dam technology: reflections on its transfer. Interdiscip. Sci. Rev. 16, 35–44 (1991).
Biswas, A. K. Hydrologic engineering prior to 600 BC. J. Hydraulics Div. 93, 115–136 (1967).
Saxena, K. R. & Sharma V. Dams: Incidents and Accidents (CRC Press, 2004).
Zheng, X. Research on water history and water civilisation. Soc. Sci. Lit. Publ. House Politics Law Media Branch 01, 21–48 (2021).
Angelakis, A. N. Hydro-technologies in Minoan Era. Water Sci. Technol. Water Supply 17, 1106–1120 (2017).
Ahmed, A. T., Fatma, E. G., Vasileios, A. T. & Angelakis, A. N. Egyptian and Greek water cultures and hydro-technologies in ancient times. Sustainability 12, 9760 (2020).
Wu, Z. & Li, Z. The Origin of the Beijing-Hangzhou Grand Canal (China Water Resources, 1997) https://www.cnki.com.cn/Article/CJFDTOTAL-SLZG199709031.htm.
Li, K. & Li, P. Dujiangyan Irrigation Project - The Shining Pearl of China’s Traditional Water Control Culture. 75–78 +11 (China Water Resources, 2004). https://qikan.cqvip.com/Qikan/Article/Detail?id=11198059.
Yin, D. Discovery of Han Dynasty Dam Engineering Site in Anfeng Ponds, Shouxian County, Anhui Province. Cultural Relics 61–62, https://www.cnki.com.cn/Article/CJFDTotal-WENW196001019.htm (1960).
Li, D. & Zhao, B. A Study on the Functional Changes of Lingqu Water Conservancy Engineering in Historical Times, 14–19 +147 (Three Gorges Forum, 2012), https://qikan.cqvip.com/Qikan/Article/Detail?id=42120989.
Liu, B. et al. Earliest hydraulic enterprise in China, 5100 years ago. Proc. Natl. Acad. Sci. 114, 13637–13642 (2017).
Wang, N. Survey and excavation of the water conservancy system in Liangzhu City and Surroundings. Res. Herit. Preservation 1, 102–110 (2016).
Liu, B. et al. Liangzhu: The Kingdom of Divine Kings, 4–21 (Chinese Cultural Heritage, 2017) https://www.cnki.com.cn/Article/CJFDTotal-CCRN201703002.htm.
Menze, B., Ur, J. & Sherratt, A. Detection of Ancient Settlement Mounds. Photogrammetric Eng. Remote Sens. 72, 321–327 (2006).
De Laet, V., Paulissen, E. & Waelkens, M. Methods for the extraction of archaeological features from very high-resolution Ikonos-2 remote sensing imagery, Hisar (southwest Turkey). J. Archaeological Sci. 34, 830–841 (2007).
Chen, L. et al. Refinement of a method for identifying probable archaeological sites from remotely sensed data. In Mapping Archaeological Landscapes from Space, 251–258 (Springer, 2012).
Lasaponara, R. et al. Towards an operative use of remote sensing for exploring the past using satellite data: the case study of Hierapolis (Turkey). Remote Sens. Environ. 174, 148–164 (2016).
Soroush, M. et al. Deep learning in archaeological remote sensing: automated qanat detection in the Kurdistan region of Iraq. Remote Sens. 12, 500 (2020).
Verschoof-van, V., Wouter, B. & Landauer, J. Using CarcassonNet to automatically detect and trace hollow roads in LiDAR data from the Netherlands. J. Cultural Herit. 47, 143–154 (2021).
Zhang, Y. et al. A convenient archaeological ruins identification method through elevation information extraction from CORONA stereo pairs. Herit. Sci. 12, 322 (2024).
Galiatsatos, N. Assessment of the CORONA Series of Satellite Imagery in Landscape Archaeology: A Case Study from the Orontes Valley, Syria (Durham University, 2024).
Philip, G., Donoghue, D., Beck, A. & Galiatsatos, N. CORONA satellite photography: an archaeological application from the Middle East. Antiquity 76, 109–118 (2002).
Liaw, A. & Wiener, M. Classification and regression by Random Forest. R. N. 2, 18–22 (2002).
Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, 1440–1448 (IEEE, 2015).
Ren, S. et al. Faster R-CNN: Towards real-time object detection with region proposal networks. IEEE Trans. pattern Anal. Mach. Intell. 39, 1137–1149 (2016).
Ren, S., He, K., Girshick, R. & Sun, J. Faster r-cnn: towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems 28 (NIPS, 2015).
Redmon, J. & Farhadi, A. Yolov3: An incremental improvement. Preprint at https://arxiv.org/abs/1804.02767 (2018).
Kim, J. et al. Object detection and classification based on YOLO-V5 with improved maritime dataset. J. Mar. Sci. Eng. 10, 377 (2022).
Terven, J., Córdova-Esparza, D.-M. & Romero-González, J.-A. A comprehensive review of YOLO architectures in computer vision: from YOLOv1 to YOLOv8 and YOLO-NAS. Mach. Learn. Knowl. Extraction 5, 1680–1716 (2023).
Khanam, R. & Hussain, M. Yolov11: an overview of the key architectural enhancements. Preprint at https://arxiv.org/abs/2410.17725 (2024).
Rezatofighi, H. et al. Generalized intersection over union: a metric and a loss for bounding box regression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 658–666 (IEEE, 2019).
Woo, S. et al. CBAM: convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), 3–19 (ECCV, 2018).
Song, X., Zhang, T. & Yi, W. An improved YOLOv8 safety helmet wearing detection network. Sci. Rep. 14, 17550 (2024).
Li, X., Tong, J., Chen, Z., Bao, Y. & Ni, J. Small target detection based on improved YOLOv5. Comput. Syst. Appl. 31, 242–250 (2022).
Chen, R., Liu, Z., Ou, W. & Zhang, K. Small target detection algorithm based on improved YOLOv5. Electronics 13, 4158 (2024).
Hammer, E., FitzPatrick, M. & Ur, J. Succeeding CORONA: declassified HEXAGON intelligence imagery for archaeological and historical research. Antiquity 96, 679–695 (2022).
Hammer, E. & Ur, J. Near Eastern landscapes and declassified U2 aerial imagery. Adv. Archaeological Pract. 7, 107–126 (2019).
Jing, M. et al. Detecting unknown dams from high-resolution remote sensing images: a deep learning and spatial analysis approach. Int. J. Appl. Earth Observ. Geoinf. 104, 102576 (2021).
Li, M. et al. Combining deep learning and hydrological analysis for identifying check dam systems from remote sensing images and DEMs in the Yellow River Basin. Int. J. Environ. Res. Public Health 20, 4636 (2023).
Buławka, N., Orengo, H. A. & Berganzo-Besga, I. Deep learning-based detection of Qanat underground water distribution systems using HEXAGON spy satellite imagery. J. Archaeological Sci. 171, 106053 (2024).
Acknowledgements
We would thank to Zhejiang Provincial Institute of Cultural Relics and Archaeology, for their help of survey data and archaeological suggestion to this research. This research was funded by the International Research Centre of Big Data for Sustainable Development Goals, grant number CBAS2022GSP02; and Research and Demonstration Application of Key Technologies for Investigation of Ancient Water Conservancy Remains Based on Remote Sensing and Geographic Information System (GIS), grant number 2024005.
Author information
Authors and Affiliations
Contributions
Conceptualization, Yiran Wang, Shaochun Dong, Yixin Zhang and Hongwei Yin; Investigation, Yiran Wang and Yixin Zhang; Methodology, Yiran Wang, Shaochun Dong, Yixin Zhang and Tao Zhang; Supervision, Shaochun Dong and Hongwei Yin; Visualization, Yiran Wang and Tao Zhang; Writing – original draft, Yiran Wang and Shaochun Dong; Writing – review & editing, Yiran Wang, Shaochun Dong, Yixin Zhang, Hongwei Yin and Tao Zhang.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Wang, Y., Dong, S., Zhang, Y. et al. Machine learning-based identification of ancient water management facilities in Liangzhu, China. npj Herit. Sci. 13, 516 (2025). https://doi.org/10.1038/s40494-025-02083-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s40494-025-02083-1

















