Introduction

Settlements are the result of the combined influence of nature and society, their distribution follows certain patterns, reflecting humanity’s perception of the environment. Understanding these patterns is crucial for exploring environmental selection strategies across different periods and interpreting human-environment relationships1. On a local scale, the distribution of settlement sites is closely tied to natural environmental factors. The specific manifestations, such as the location, form, and distribution of settlements, are influenced or even controlled by natural environmental factors, including climate conditions, topographical features, and water system patterns1. For instance, Holocene fluvial geomorphic evolution was the primary factor affecting prehistoric settlement patterns in the Lower Mississippi River Valley2. In Oman, the distribution of settlement sites and burial grounds were primarily determined by the availability of marine resources and oasis locations3. In Henan Province, settlement sites showed distinct differences in their dependence on slope aspect between the Longshan and Yangshao cultural periods4. Similarly, in the Wuding River Basin, mid-Holocene settlement site selection was predominantly influenced by geomorphic types and proximity to rivers5. These studies show that the prioritization of environmental factors in the selection of human settlement sites varies from region to region or across different periods within the same region6. Therefore, it is imperative to clarify the coupling process between settlement strategies and environmental evolution across different regions to comprehend the trajectory of human civilization, its developmental patterns, and its responses to environmental changes7.

Archaeological predictive modeling (APM) is a potent tool for explaining settlement patterns and environmental genesis. This approach examines the correlations between archaeological site locations, non-site areas, and diverse environmental variables to identify statistical patterns in site distribution, thereby predicting the probability of potential site occurrence. The binary classification approach was employed, categorizing the spatial area into ‘site points’ (locations known to contain archaeological sites) and ‘random points’ (locations assumed to lack archaeological sites). Random points were generated to simulate the distribution of non-site areas, serving as control data to better elucidate the relationships between site points and natural environmental variables8. The most extensively utilized method is the classic statistical technique of logistic regression9, yet these models tend to be inadequate in handling complex relationships and are susceptible to issues such as underfitting and low accuracy9. The advent of machine learning presents an opportunity to improve prediction accuracy and provide more nuanced archaeological interpretations. For instance, Wu et al. successfully implemented logistic regression models for site prediction in the Lingnan region10, while Wachtel et al. conducted comparative analyses between logistic regression and MaxEnt models11. In research focused on the Tibetan Plateau, Liu employed four distinct machine learning algorithms to model prehistoric agricultural dispersal routes, with random forests (RFs) yielding superior classification accuracy12. Similarly, Li et al. achieved high-precision cultural phase classification and site prediction through the integration of random forest models with GIS techniques in the same region13. Machine learning, which constructs models based on sample data (training data) to generate predictions or decision-making behaviors, is broadly applied in fields such as remote sensing and geology14. It holds significant potential for enhancing the performance of APM and uncovering more valuable archaeological insights13.

Zhejiang Province, situated in the lower Yangtze River basin in southeastern China, holds significant historical importance due to its advanced Neolithic culture and extensive distribution of archaeological sites. Nine major cultures have emerged in this area, each characterized by its unique features. The developmental trajectories of these cultures have been extensively investigated from both archaeological and geological perspectives15. In recent decades, preliminary studies have commenced to delineate Holocene environmental changes and Neolithic human activities in Zhejiang Province16,17. However, the transitions in human behavioral processes between these cultures remain unclear, which obstructs a comprehensive understanding of the intricate interplay between cultural evolution and environmental changes in the region18. Most previous discussions have relied solely on GIS for visualization and spatial analysis, with the weights of geographic factors largely assessed empirically using tools such as hierarchical analysis process. In contrast, APM based on the available data offers a more objective and comprehensive approach to evaluating the environmental rules favored by settlements. Therefore, the main purposes of this study are: (i) to qualitatively evaluate the significance of environmental factor characteristics through the application of a multi-classification model; (ii) to examine the environmental strategies employed during different cultural phases of the Neolithic period in Zhejiang Province; and (iii) to explore the environmental characteristics and socio-economic transformations of the Neolithic era, with the aim of identifying the driving forces behind these environmental selection strategies.

Methods

Geographical and cultural settings

Zhejiang Province is located on the southern flank of the Yangtze River Delta, along the southeastern coastline of China. It is bounde d by the East China Sea to the east, Fujian Province to the south, and Anhui and Jiangxi Provinces to the west, encompassing a land area of approximately 105,500 km2 (Fig. 1). The topography of Zhejiang Province shows a gradient incline from southwest to northeast, characterized by mountains in the south, widespread hills in the central regions, and a low, flat alluvial plain in the northeast, with over 60% of the area blanketed by forests19. Zhejiang Province can be broadly categorized into seven distinct geomorphic regions: North Zhejiang Plain, West Zhejiang Hills, East Zhejiang Hills, Jinqu Basin, South Zhejiang Hills, Southeast Coastal Plain and Coastal Islands20. The Zhejiang region falls under the subtropical monsoon climate, characterized by pronounced monsoon effects, mild and humid conditions, four distinct seasons, and abundant heat and precipitation21. Monsoon variability plays a pivotal role in regional climate change, significantly impacting human activities and causing various meteorological disasters. Zhejiang features four distinct seasons, with rain and heat coinciding. The average annual temperature ranges from 15 to 18 °C, with an average annual rainfall of 980–2000 mm, and an average annual sunshine duration of 1700–2100 h, reflecting a diverse distribution of climatic resource21.

Fig. 1
figure 1

Location map of the study area.

Within Su Bingqi’s seminal archaeological framework of “Six Major Cultural Spheres”, Zhejiang is situated in the “Southeast Region”, a cultural zone centered around the Taihu basin. Situated south of the Yangtze River basin, Zhejiang is replete with remains of human activities. Emerging from the Shangshan and Kuahuqiao cultures, the region developed two relatively independent Neolithic cultural systems in the Hangjiahu Plain and the Ningshao Plain. These well-preserved archaeological sequence offers exceptional opportunities for analyzing the spatiotemporal dynamics of cultural development. Along the northern shore of Hangzhou Bay in the Hangjiahu Plain, the Neolithic cultural succession proceeds chronologically through six distinct phases: the Majiabang culture (7.0–5.8 ka BP), Songze culture (6.0–5.2 ka BP), Liangzhu culture (5.3–4.3 ka BP), Qianshanyang culture (4.4–4.2 ka BP), Guangfulin culture (4.2–4.0 ka BP), and Maqiao culture (3.9–3.2 ka BP)22. On the southern shore of Hangzhou Bay in the Ningshao Plain, the principal Neolithic cultural stages include the Shangshan culture (11.0–9.0 ka BP)23, Kuahuqiao culture (8.2–7.0 ka BP)23, Hemudu culture (7.2–5.3 ka BP)24, and Liangzhu culture (5.3–4.3 ka BP). The archaeological significance of Zhejiang lies in the exceptional preservation and continuity of its prehistoric cultural remains, the clear evidence of regional interconnections, and the representative nature of its archaeological assemblages. These characteristics establish Zhejiang as an ideal region for studying the temporal and spatial dynamics of Neolithic sites.

Research data

The dataset used in this study consists primarily of 432 archaeological site points within the study area, which were labeled as “1” in the machine learning process. These archaeological site data were primarily sourced from the precise locations of known archaeological sites documented in the literature and the ‘Chinese Cultural Relics Atlas-Zhejiang Volume’25, the unified search service platform of the Chinese Social Sciences Network. A comprehensive database of Neolithic sites in Zhejiang Province was established, encompassing site names, geographical coordinates, and other terrain data. Based on this compilation, a cultural chronological framework was developed for the archaeological sites within the study area. In addition, 400 random points were generated using the sampling tool in the ArcGIS data management module, and labeled as 0 in the machine learning process to indicate potential areas of evidence for unknown human activities. This approach is designed to contrast the intentional selection of settlement locations with random spatial distribution patterns. While these points are generated randomly, there is a possibility of overlap with actual settlement sites, which may introduce a certain degree of error. However, the error rate is expected to be relatively low, and the model predictions based on this sample remain sufficiently robust and effective. The method of using random points as virtual sites for comparison with environmentally selected human activity points has been extensively adopted11,26. In the context of regional environment, we selected nine factors in four categories: topography, vegetation, water system, and soil. The specific details regarding the data are presented in Table 1.

Table 1 Data introduction, preprocessing, and sources

Machine learning models

Machine learning has seen extensive application in spatial prediction14. In this study, we developed and compared multiple binary classification models, including logistic regression, decision tree, random forest, and XGBoost algorithms. The hyperparameters were optimized using grid search and cross-validation to identify the optimal classifier. Additionally, we implemented a multi-class XGBoost model to analyze the temporal dynamics of environmental preferences across distinct cultural periods in the Zhejiang region. To elucidate the complex interactions between environmental variables and settlement decisions, we employed SHAP analysis, which revealed distinctive shifts in environmental selection criteria throughout the Neolithic period.

XGBoost (Extreme Gradient Boosting) is an efficient ensemble learning algorithm based on gradient boosting27. By combining multiple weak learners, XGBoost constructs a powerful predictive model. The core concept of gradient boosting involves iteratively training a series of base learners, with each base learner aims to correct the errors made by its predecessor. This process is accomplished by fitting the residuals at each iteration. The model’s objective function comprises two components: a loss function, which quantifies the error between the predicted and true values, and a regularization term, which controls the complexity of the model to mitigate overfitting12. During the training process, the model improves its predictive performance by minimizing a specified loss function, such as mean squared error or log loss. The regularization term effectively controls model complexity by penalizing both the number of leaf nodes and the L2 norm of leaf weights in each decision tree. In each iteration, XGBoost learns a new weak learner by minimizing the objective function. The objective function can be expressed as:

$$L\left({\rm{\phi }}\right)=\mathop{\min }\limits_{f}\mathop{\sum }\limits_{i=1}^{n}L\left({y}_{i},\hat{{y}_{i}}\right)+\mathop{\sum }\limits_{k=1}^{K}\Omega \left({f}_{k}\right)$$

where \(L\left({\rm{\phi }}\right)\) represents the objective function, \(L\left({y}_{i},\hat{{y}_{i}}\right)\) denotes the loss function for the i-th sample, \({y}_{i}\) is the true label, and \(\hat{{y}_{i}}\) is the predicted value. K is the number of base learners, and \({f}_{k}\) represents the k-th base learner. \(\Omega \left({f}_{k}\right)\) is the regularization term, which is used to control the model’s complexity.

Logistic regression is a multivariate statistical analysis method that builds a mathematical model by analyzing the relationship between the dependent variable and independent variables28. The core idea of logistic regression is to fit a logistic function (e.g., the Sigmoid function) to predict the probability of an event, thereby achieving the goal of classification28. The basic formula for logistic regression is as follows:

$$P(Y=1|X)=\frac{1}{1+{e}^{-({\beta }_{0}+{\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\cdots +{\beta }_{n}{X}_{n})}}$$

Where \(P(Y=1|X)\) represents the probability of the dependent variable Y = 1 given the independent variables X; e is the base of the natural logarithm (approximately 2.71828); β0 is the intercept term; and β1, β2…, βn are the regression coefficients corresponding to the independent variablesX1, X2…, Xn.

A decision tree is a commonly used machine learning algorithm that employs a tree-like structure to classify or predict data based on a series of decision rules29. In a decision tree, each internal node represents a feature or attribute, while each leaf node corresponds to a class or prediction value. During training, the model selects a feature as the root node and partitions the dataset into multiple subsets, recursively applying this process to each subset until all subsets contain data from a single class or meet certain stopping criteria29. The formula is as follows:

$${Gini}=1-\mathop{\sum }\limits_{k=1}^{K}p{\left({X}_{k}\right)}^{2}$$
$$Gini\,gain=parent\,node\,Gini\text{-}child\,node\,Gini$$

• K: Number of classes;

• Xk: Class k; k = 1, …, K;

• p(Xk): The classification probability of Xk.

Random Forest is a case of the Bagging method, where decision trees are used as base learners, and feature selection is randomized30. The algorithm enhances model diversity and robustness by independently constructing multiple decision trees on different bootstrap sample subsets and random feature subsets. During the prediction phase, Random Forest aggregates the predictions of all decision trees using a voting mechanism for classification tasks or averaging for regression tasks13. The formula is as follows:

Training dataset: \(D=\{\left({x}_{1},{y}_{1}\right),\left({x}_{2},{y}_{2}\right),\ldots ,\left({x}_{n},{y}_{n}\right)\}\)

Number of trees in the forest: T.

Train a decision tree ht(x) on Dt using a random subset of features for each split.

For a new sample x, aggregate the predictions ht(x) of all TT trees via majority voting:

$$\hat{y}=\text{mode}\left({h}_{1}\left(x\right),{h}_{2}\left(x\right),\ldots ,{h}_{T}\left(x\right)\right)$$

ht(x) represents the prediction of the t-th decision tree for input x. “mode” refers to the most commonly predicted class among all trees.

Hyperparameter optimization and model evaluation

Hyperparameter optimization is a systematic process aimed at identifying the optimal combinations of hyperparameters to enhance the performance and generalization ability of machine learning models30. Hyperparameters are defined prior to model training to govern the structure of the model and the learning process. In this study, the grid search method was employed for hyperparameter optimization. This approach systematically explores all possible combinations of predefined hyperparameter values to identify the optimal configuration. A limited range of candidate values was specified for each hyperparameter, and the grid search evaluated the model’s performance across all combinations. The optimal set of hyperparameters was then selected based on the configuration that achieved the highest performance metrics31.

In this study, model evaluation was conducted using both the holdout validation and cross-validation. 80% of the samples were designated as the training set, while 20% were used as the test set for holdout validation. Additionally, cross-validation was applied with a set number of folds equal to 5. In each iteration, we calculated the performance metrics of the model on the validation set, such as accuracy (measures the proportion of correctly predicted samples to the total number of samples), precision (represents the proportion of correctly predicted positive samples), recall (represents the proportion of actual positive samples), and F1 score (the harmonic means of precision and recall)31. The model’s performance was further assessed using confusion matrix, ROC, and AUC. The confusion matrix was used to compare the model’s predictions with the actual labels to evaluate the classification model’s performance. The AUC, which stands for “area under the ROC curve,” provided a comprehensive measure of the classification model’s performance32. An ideal model achieves an AUC value of 1, indicating perfect classification capability. AUC values between 0.85 and 0.95 suggest excellent classification performance, while values between 0.7 and 0.85 indicate moderate performance, with limited effectiveness in distinguishing between classes. AUC values ranging from 0.5 to 0.7 reflect poor performance, making the model nearly unusable. The ROC curve graphically depicts the true positive rate versus the false positive rate at various classification thresholds33. Typically, if the probability distributions of detection and false alarm rates are known, the ROC curve can also be generated by plotting the cumulative distribution function of detection probability (y-axis) against the cumulative distribution function of false alarm probability (x-axis)33.

We implemented SHAP (Shapley Additive Explanations) analysis to provide transparent insights into the XGBoost model’s decision-making process. SHAP is based on the Shapley value from game theory, which assigns an importance value to each feature for a specific prediction34. In SHAP, the contribution of each feature is calculated by comparing the model’s predictions with and without the inclusion of that feature34. The calculation of the Shapley value involves traversing all possible subsets of features to assess the average marginal contribution of features across these subsets. Specifically, for each feature i, its Shapley value is calculated as follows:

$${{\rm{\phi }}}_{i}\left(f\right)=\sum _{S\subseteq N\setminus \{i\}}\frac{\left|S\right|!\left(\left|N\right|-\left|S\right|-1\right)!}{\left|N\right|!}\left[f\left(S\cup \{i\}\right)-f\left(S\right)\right]$$

In this context, N denotes the set of all features, S represents a subset excluding feature i, and f(S) is the model’s prediction result given the feature subset S. The SHAP method applies this concept for interpreting machine learning models by calculating the SHAP value for each feature to explain the prediction result. SHAP serves as a robust framework for model interpretation, applicable across various machine learning models, including decision trees. Through the computation of SHAP values, systematic analysis of feature contributions, and comprehensive visualization of results, we obtained detailed insights into the model’s decision-making mechanisms. This methodological approach significantly enhanced both the interpretability and reliability of the model’s predictions.

Results

Evaluation of binary model performance

After calculating the Pearson correlation coefficient matrix for the variables, we found no significant correlations between the features.

The logistic regression model was employed to address the binary classification problem. The hyperparameters in the logistic regression model serve distinct roles. ‘C’ represents the inverse of the regularization strength, with smaller values indicating stronger regularization to mitigate overfitting. ‘Penalty’ defines the type of norm applied during regularization, and ‘solver’ specifies the optimization algorithm used to fit the model. The hyperparameters ‘C’, ‘penalty’, and ‘solver’ were tuned using grid search and cross-validation. The optimal combination of parameters was determined to be {‘C’: 0.01, ‘penalty’: ‘l2’, ‘solver’: ‘lbfgs’}. Applying these parameters to the model resulted in an accuracy of 90.68% on the test set and 90.50% on the training set. The confusion matrix (Fig. 2a) illustrates the model’s predictions for each category, with 65 predictions for category 0 (non-site points) and 81 predictions for category 1 (site points), out of a total of 181 test samples.

Fig. 2: Classification Results of Different Models.
figure 2

a Logistic regression prediction results, b Decision tree model prediction results), c Random forest model prediction results, d XGBoost model prediction results.

A decision tree model was also utilized to solve the binary classification problem. The decision tree model employs several hyperparameters to control tree structure and performance: ‘max_depth’ constrains tree depth to manage model complexity; ‘min_samples_split’ specifies the minimum sample threshold for node splitting; ‘min_samples_leaf’ defines the minimum samples required in leaf nodes; and ‘criterion’ determines the split quality measurement method. The hyperparameters ‘max_depth’, ‘min_samples_split’, ‘min_samples_leaf’, and ‘criterion’ were tuned using grid search and cross-validation. The following ranges of hyperparameter values were explored: ‘max_depth’: [None, 10, 30, 50], ‘min_samples_split’: [2, 5, 10], ‘min_samples_leaf’: [1, 2, 4], and ‘criterion’: [‘gini’, ‘entropy’]. The optimal parameter combination was identified as {‘criterion’: ‘gini’, ‘max_depth’: 10, ‘min_samples_leaf’: 4, ‘min_samples_split’: 2}. With these parameters, the model achieved an accuracy of 88.20% on the test set and 98.29% on the training set. The confusion matrix (Fig. 2b) shows 69 predictions for category 0 (non-site points) and 72 predictions for category 1 (site points), out of a total of 181 test samples.

A Random Forest model was implemented to address the binary classification problem. The Random Forest model utilizes several hyperparameters to optimize its ensemble learning architecture: ‘n_estimators’ specifies the total number of decision trees in the ensemble; ‘max_depth’ constrains individual tree complexity to mitigate overfitting; ‘max_features’ controls feature subset selection during node splitting (implemented with logarithmic scaling); ‘min_samples_split’ establishes the threshold for node division; and ‘min_samples_leaf’ determines the minimum sample size for terminal nodes. The hyperparameters ‘n_estimators’, ‘max_depth’, and ‘min_samples_split’ were tuned using grid search and cross-validation. The following ranges of hyperparameter values were tested: ‘n_estimators’: [100, 200, 300], ‘max_depth’: [None, 10, 20, 30], and ‘min_samples_split’: [2, 5, 10]. The optimal combination of parameters was determined to be {‘max_depth’: 10, ‘max_features’: ‘log2’, ‘min_samples_leaf’: 1, ‘min_samples_split’: 2, ‘n_estimators’: 200}. Using these parameters, the model achieved an accuracy of 93.17% on the test set and 99.38% on the training set. TThe confusion matrix (Fig. 2c) indicates 65 predictions for category 0 (non-site points) and 81 predictions for category 1 (site points), out of a total of 181 test samples.

The XGBoost model was employed to solve the binary classification problem. The XGBoost model incorporates multiple hyperparameters to optimize its gradient boosting framework: ‘n_estimators’ determines the number of sequential trees, balancing model capacity and overfitting risk; ‘max_depth’ regulates tree complexity by limiting depth; ‘learning_rate’ controls the weight update magnitude during training iterations; ‘subsample’ defines the proportion of training instances used for each tree construction; and ‘colsample_bytree’ specifies the feature subset ratio for tree development, enhancing model generalization through feature diversity. The hyperparameters ‘n_estimators’, ‘max_depth’, ‘learning_rate’, ‘subsample’, and ‘colsample_bytree’ were tuned using grid search and cross-validation. The following ranges of hyperparameter values were explored: ‘n_estimators’: [100, 200, 300], ‘max_depth’: [3, 6, 10], ‘learning_rate’: [0.01, 0.1, 0.2], ‘subsample’: [0.5, 0.8, 1.0], and ‘colsample_bytree’: [0.8, 1.0]. The optimal parameter combination was identified as {‘colsample_bytree’: 0.8, ‘learning_rate’: 0.2, ‘max_depth’: 3, ‘n_estimators’: 200, ‘subsample’: 1.0}. With these parameters, the model achieved an accuracy of 94.41% on the test set and 100% on the training set. The confusion matrix (Fig. 2d) shows 70 predictions for category 0 (non-site points) and 82 predictions for category 1 (site points), out of a total of 181 test samples.

The objective function of XGBoost includes a regularization term, which helps control model complexity and prevent overfitting. This characteristic allows XGBoost to better balance model fitting and generalization during training, whereas other models, such as Random Forest and Decision Trees, exhibit relatively weaker control in this regard. The comparative analysis of the four machine learning models demonstrates the superior performance of XGBoost in addressing this binary classification problem (Table 2). Among all models evaluated, XGBoost achieved the highest accuracy (94.41%) on the test set, surpassing the Random Forest (93.17%), Logistic Regression (90.68%), and Decision Tree (88.20%) models. Furthermore, XGBoost exhibited exceptional performance across all evaluation metrics, with consistent scores of 94.41% for precision, recall, and F1-score, indicating balanced and robust predictive capabilities. The model’s effectiveness is further validated by its exceptional AUC value of 0.99, the highest among all tested models, suggesting superior discriminative ability. These results collectively demonstrate that XGBoost provides the most reliable and accurate predictions for this classification task, likely due to its advanced gradient boosting framework and effective handling of complex feature interactions.

Table 2 Model classification performance results using different methods

Multi-classification model test

Following a comprehensive evaluation of all performance metrics, the XGBoost model was ultimately selected as the predictive model for subsequent multi-class classification tasks. Considering the dataset size constraints, the nine cultural periods in Zhejiang Province were consolidated into seven distinct categories. Specifically, the dependent variables in the model encompassed the following archaeological periods: Shangshan, Kuahuqiao, Hemudu, Majiabang, Songze, Liangzhu, QSY-MQ (Qianshanyang-Maqiao), and non-site locations (n = 400). The detailed quantities are presented in Table 3.

Table 3 Prehistoric cultural changes and the number of sites in Zhejiang region

To better address multi-class classification problems, we optimized the parameters and ultimately identified the optimal parameter combination as (learning_rate = 0.1, max_depth = 3, n_estimators = 100, seed = 42, eval_metric = mlogloss). The model’s results reveal its classification effectiveness through the performance of the ROC curve (Fig. 3), which deviates significantly from the y = x diagonal. The calculated AUC value of 0.93 further underscores the model’s strong classification capability, as AUC values between 0.85 and 0.95 are indicative of robust predictive performance. In summary, this study successfully utilized machine learning to model the spatial distribution of human activities during the Neolithic era in Zhejiang, resulting in a model with high predictive capability and rationality.

Fig. 3
figure 3

Multi-category prediction results.

The importance of variables within the model was evaluated based on their contribution to reducing the loss function in each lifting iteration. As shown in Fig. 3, the top five variables ranked by importance are, in descending order: Elevation, Surface Relief, Slope, outside buffer 5000 m, and buffer 1000 m. To enhance model interpretability, we conducted a feature importance ranking analysis for the binary classification model (archaeological sites versus non-sites). Notably, all implemented models consistently identified three primary environmental factors - elevation, surface relief and slope - as the most significant predictors of site locations. These results highlight the critical role of topographic and environmental features in determining site locations.

Model interpretation

In recent years, Lundberg et al. proposed a unified method to explain the output of machine learning, known as the SHAP method34. In terms of global interpretability, SHAP can provide the importance ranking of features and their influence patterns across the entire dataset. In terms of individual interpretability, it can offer insights into the influence of features, along with their positive or negative contributions, for a single sample.

The SHAP method offers a comprehensive framework for interpreting the prediction results of the XGBoost model from multiple perspectives. Using SHAP analysis, the relative importance of input variables was quantified by calculating the contribution of each feature to the model’s predictions. Figure 4 illustrates the feature importance rankings based on the mean absolute SHAP values computed across the complete dataset. The top five input features in descending order of importance, are Elevation, Surface Relief, Vegetation, Slope, and Curvature. Notably, these results align closely with the feature importance rankings identified by the XGBR model. Among all input features, Elevation emerged as the most critical predictor, followed by Surface Relief, Vegetation Types, and Slope, which also demonstrated significant contributions. Curvature, NDVI, Soil Types, and Aspect were found to have moderate importance, collectively accounting for less than 25% of the total feature importance. The “Outside Buffer 5000 m” variable showed a certain degree of influence on the model’s predictions, while the remaining variables had relatively minimal impact.

Fig. 4
figure 4

SHAP feature importance ranking for environmental variables in prehistoric settlement classification.

Discussion

The feature importance ranking derived from XGBoost model reveals significant associations between geographical parameters and settlement distribution patterns. Through SHAP analysis, we have generated comprehensive variable contribution hierarchies that offer novel perspectives on the evolution of human-environment selection strategies throughout the archaeological sequence. Figure 5 presents a summary plot of the SHAP values for each feature, illustrating the distribution of SHAP values for each input feature and their corresponding influence patterns. In this graph, the x-axis represents the feature SHAP values, while the y-axis represents the input features sorted by their importance. The dots represent individual samples in the dataset, with the chromatic gradient of the dots indicating the specific feature SHAP values. The gradient transitions from blue to red, signifying low to high feature values. A positive SHAP value for an input feature implies a positive correlation with the output result, whereas a negative SHAP value suggests a negative correlation. Thus, the comprehensive visualization of SHAP values not only identifies the importance of features but also elucidates the influence of each feature on selection of settlement locations. These trends also align with the trends observed in Fig. 4. The SHAP methodology thus provides a global interpretation of the effects of input variables on the selection of settlement contexts.

Fig. 5: SHAP feature contribution plot.
figure 5

a Shangshan culture, b Kuahuqiao culture, c Hemudu culture, d Majiabang culture, e Songze culture, f Liangzhu culture, g Qianshanyang-Maqiao culture, h Non-Heritage Site.

Shangshan Period (11.0–9.0 ka BP): During this cultural period, a hierarchical pattern of environmental variables influencing settlement distribution is revealed, with vegetation types emerging as the primary determinant, followed by curvature, elevation, NDVI, and surface relief (Fig. 5a). The model demonstrates significant positive correlations between settlement locations and regions characterized by reduced curvature, elevated topography, and minimal surface relief. Smaller NDVI values have a more significant positive impact on the model. These findings suggest that the Shangshan inhabitants carefully evaluated a combination of environmental factors, including vegetation type, terrain curvature, elevation, NDVI, and surface relief, when selecting settlement sites. Settlements were predominantly located in river valleys and basin areas characterized by flatter terrain, gently sloping areas, and slightly higher elevations with minimal surface relief. This pattern aligns with Fuller et al.‘s observations regarding the environmental prerequisites for early rice domestication, wherein they posit that alluvial valleys characterized by gentle terrain presented ideal conditions conducive to the initial development of rice agriculture35. This settlement strategy aligns with archaeological evidence, which indicates that the Shangshan culture relied heavily on gathering and hunting as primary subsistence strategies, alongside the early development of primitive rice agriculture at the Shangshan site36.

Kuahuqiao period (8.2–7.0 ka BP): During this cultural period, settlement distribution was primarily influenced by five environmental variables: elevation, surface relief, slope, soil types, and NDVI (Fig. 5b). Topographic factors, including elevation, surface relief, and slope, are characterized by smaller values being positioned in the positive region of the X-axis, indicating a positive contribution to the model output. And greater NDVI values are positively associated with settlement suitability. The proxy values for soil types are relatively low among all soil type proxy values, with most being of the anthropogenic soil type. Additionally, smaller aspect values and higher curvature also contribute positively to the model output. Settlement patterns during the Kuahuqiao period reflect a clear preference for specific environmental conditions: gentle topography characterized by lower elevations, minimal surface relief, and modest slopes. Archaeological investigations reveal that the Kuahuqiao site is situated on elevated terrain along the Hangzhou Bay coast, surrounded by a landscape of shallow seas and wetland environments37. The environmental signature of these settlements suggests landscapes with substantial vegetation cover and limited anthropogenic soil modification. This finding is consistent with paleovegetation reconstruction studies, which indicate that during the Kuahuqiao period, the Hangzhou Bay region was dominated by subtropical evergreen broad-leaved forests interspersed with wetland vegetation, providing abundant resources for hunting-gathering activities and early agricultural practices38. Notably, despite the general preference for gentler terrain, the elevated curvature values indicate persistent settlement proximity to mountainous regions, suggesting a strategic balance between accessibility and resource availability.

Hemudu period (7.2–5.3 ka BP): During this cultural period, the top five variables influencing settlement selection are, in descending order of importance, elevation, curvature, surface relief, aspect, and NDVI (Fig. 5c). Lower elevation and surface relief values are located in the positive region of the X-axis, indicating their positive contribution to the model output. Similarly, moderate to high values of curvature and Aspect significantly influence settlement suitability. Higher NDVI values also positively influence the model, with vegetation types referring to smaller numbers. Notably, settlements within a 1000 m buffer zone exhibit a positive effect on the model. In summary, during the Hemudu culture period, settlement locations were characterized by lower elevations with minimal topographic relief, proximity to mountainous areas offering favorable vegetation conditions and abundant light, and closeness to water sources. The conditions at the site supported a mixed subsistence strategy combining rice agriculture, hunting, and gathering. This aligns with archaeological findings, as the Tianluoshan site profile is located near the southern foothills of Cuiping Mountain in the northern part of the plain39. During periods of marine transgression, the preference for mountainous areas with higher curvature was likely an adaptive strategy to mitigate flood risks. This is evidenced by the alternating layers of paddy fields, peat, and marine deposits discovered at sites such as Tianluoshan and Jingtoushan39,40.

Majiabang period (7.0–5.8 ka BP): During this cultural period, settlement patterns were primarily influenced by five variables: elevation, surface relief, NDVI, distance from water sources (outside buffer 5000 m), and slope (Fig. 5d). Topographic factors, such as elevation and smaller values of surface relief and slope, exert a pronounced positive influence on the model output. Similarly, higher NDVI values, indicative of greater vegetation cover, significantly contribute to settlement suitability. In contrast, the outside buffer 5000 m variable demonstrates a negative effect on the model, suggesting a preference for locations closer to water sources. The settlement pattern analysis indicates that Majiabang communities exhibited a clear preference for environments characterized by lower elevations, minimal surface relief, and abundant vegetation cover—characteristics typically associated with agricultural suitability. This finding is consistent with the broader archaeological settlement distribution pattern, characterized by sites situated in low-elevation topography that exhibited heightened vulnerability to marine transgression-regression cycles41. Proximity to water sources played a critical role in settlement selection, with most settlements located within 5 km of rivers and only a few extending beyond this buffer zone. This pattern reflects the importance of water accessibility for agricultural production and daily subsistence needs during this cultural period. The settlement strategies of the Majiabang period highlight a strong reliance on environmental factors that supported early agricultural practices, emphasizing the interplay between topography, vegetation, and water resources in shaping settlement patterns.

Songze period (6.0–5.2 ka BP): During this cultural period, the top five variables influencing settlement patterns are elevation, slope, surface relief, curvature, and aspect, all of which are topographic and geomorphic factors (Fig. 5e). Points with lower values of elevation, slope, surface relief, and curvature are positioned in the positive region of the X-axis, indicating a positive influence on the model output. Compared to the preceding cultural period, these variables exhibit a significant downward trend in importance. Points with higher slope orientation are also located in the positive region of the X-axis, showing a positive correlation with the model. In summary, during the Songze culture period, settlements were preferentially located in areas with lower elevations and gentler terrain, along with higher NDVI values and extensive vegetation cover. Research indicates that the environment of the eastern coastal plains of China around 5000 years ago was relatively similar to the modern environment. The flat terrain and abundant water resources provided favorable conditions for human habitation and the development of agriculture42. Artifacts used by ancient humans, along with carbonized rice husks and plant seeds unearthed from sites dating to this period, also indirectly reflect this43. The relatively low importance of the distance to the water source suggests that this natural factor had minimal impact on settlement distribution. This pattern may reflect an evolution in resource management, as larger settlement sizes during this period suggest that ancient humans had developed effective strategies for managing water resources, thereby demonstrating increased environmental adaptability.

Liangzhu period (5.3–4.3 ka BP): During this cultural period, settlement patterns were primarily influenced by five variables: elevation, surface relief, curvature, NDVI, and outside buffer 5000 m (Fig. 5f). Among these, the three dominant topographic factors demonstrate that lower values of elevation, surface relief, and curvature positively correlate with settlement locations. Additionally, decreased NDVI values also show positive associations with settlement distribution, while proximity to water sources exhibits a complex pattern - locations within 5000 m of water sources demonstrate stronger positive effects compared to those beyond this threshold. Notably, the distribution of slope direction during this period was more balanced, reflecting a reduced influence of slope orientation and gradient on settlement location. These findings suggest that Liangzhu communities primarily favored plain areas with flat terrain and lower elevations, though there was a clear trend toward diversification in settlement choices. The expansion of human activities during this period is evident in the increased number of settlements located beyond 5000 m from water sources, highlighting improved adaptability to diverse environmental conditions and a reduced dependence on immediate proximity to water. Archaeological discoveries have also confirmed the existence of a large-scale dam system surrounding the Liangzhu Ancient City, where mountain bodies were connected with dams44. The reduced dependence on vegetation cover and slope-related factors further highlights the enhanced environmental adaptability and agricultural productivity of Liangzhu settlements in their surrounding landscape. During the Songze-Liangzhu cultural period, rice cultivation underwent substantial advancement, with the rice paddy fields at the Maoshan site exemplifying the sophisticated irrigation techniques and advanced hydrological management systems characteristic of this era45.

Qianshanyang-Maqiao period (4.4–3.2 ka BP): During this cultural period, settlement patterns were primarily influenced by five variables: elevation, surface relief, slope, soil types, and aspect (Fig. 5g). Geomorphic factors, including Elevation, Surface Relief, Slope, Curvature, and Aspect, demonstrate positive correlations with the model output. During this period, the proportion of areas with higher values for soil type indicators increased, while artificial soils decreased. The vegetation types were predominantly subtropical coniferous forests and cultivated vegetation. Compared to earlier cultural phases, settlement patterns during this period reveal a distinct preference for mountainous and hilly terrain with higher elevations and more pronounced surface undulations. Research indicates that large-scale, persistent, and frequent flooding rendered most settlements of the Liangzhu culture period unsuitable for human habitation, with some sites even directly submerging earlier settlements. The selection of areas with steeper slopes and higher curvature indicates a decline in agricultural development at these settlements and a transformation in their subsistence economic patterns. This shift in settlement strategy may reflect adaptive responses to resource limitations or competitive pressures in lowland areas, prompting the exploitation of previously underutilized upland environments. This aligns with archaeological evidence, which shows that agricultural fields were predominantly situated in the low-lying peripheral areas of the sites. The recurring flood events disrupted rice cultivation and significantly compromised the role of rice agriculture as the dominant mode of subsistence economy16,41. Overall, these findings indicate a substantial evolution in settlement strategies during the QSY-MQ period, demonstrating increased capability to navigate diverse topographic settings and adapt to more challenging terrain.

The current model was developed using modern environmental data, which introduces certain limitations. Factors such as vegetation, river systems, and sea level have undergone significant changes over millennia. This limitation is a common challenge in studies involving ancient simulations, as accurately reconstructing past surface environmental conditions remains highly complex. In contrast, the geomorphic features of Zhejiang Province have remained relatively stable at a macro scale throughout the Holocene. Despite sea-level fluctuations, sedimentation, and erosion, the major geomorphic units (e.g., plains, hills, and mountains) have experienced minimal changes. This relative stability justifies the use of modern geomorphic data as a reliable proxy for past conditions. Key factors such as elevation, slope, and topographic relief have shown minimal variation during the Holocene, allowing modern data to approximate Neolithic geomorphic conditions.

During the prehistoric period, human productivity was limited, and settlement patterns were heavily influenced by natural environmental conditions. The distribution and development of archaeological sites in the Hangjiahu region and its adjacent Ningshao area were primarily determined by macro-scale natural processes, particularly marine-terrestrial transitions and geomorphic evolution. The region’s low-lying topography and complex river networks made it especially susceptible to extreme climatic events46. The Holocene climatic phases in Zhejiang Province generally corresponded to those observed across China, although environmental transformations exhibited distinct regional characteristics46. During the Early Holocene, the climate transitioned from cool and dry conditions to mild and humid47. The Middle Holocene was characterized by warm and humid conditions, but after approximately 6.0 ka BP, significant fluctuations occurred, accompanied by a gradual decline in temperatures. By the Late Holocene, climate approached modern conditions, gradually establishing an environment similar to that of the present day47. Within this climatic context, combining the modeled feature importance with the results of the SHAP analysis, we discover that geomorphic environmental factors were of utmost feature importance, mainly including elevation, surface relief, and slope.

During the Neolithic period, human activities were predominantly concentrated in areas below 50 meters above sea level, with no archaeological sites identified at elevations above 500 m across all cultural stages (Table 4). Notably, the plains at an elevation of 0–10 meters contained the highest density of sites, totaling 339 and accounting for 61.4% of all identified sites. In comparison, sites located within the 10–50-m elevation range constituted 18% of the total. During the Shangshan Culture period, the average elevation of archaeological sites was approximately 95 m. This decreased to an average of 53 m during the Kuahuqiao Culture period and further declined significantly to 32 m during the Hemudu Culture period. By the Majiabang to Liangzhu Culture periods, the average elevation of sites stabilized, ranging between 11 and 13 m. This trend demonstrates a clear shift in human settlements from higher to lower altitudes over time. However, during the Maqiao Culture period, sites were once again distributed in areas of relatively higher elevations. Early archaeological sites were influenced by neotectonic movements and the continuous rise in sea levels. Rapid landward shifts of the shoreline inundated river valley lowlands with seawater, leaving hills and granite outcrops as critical venues for early human habitation. Consequently, most sites were located in intermontane basins with slightly elevated ground on both sides of rivers, predominantly on terraces and farther from the sea (Fig. 6a). Notably, Gu et al.‘s research discovered rice remains and reed phytolith fossils in pottery shards found in the

Fig. 6: Relationship between spatial distribution and elevation of ancient sites in. Zhejiang Province.
figure 6

a Shangshan culture, b Kuahuqiao culture, c Hemudu culture, d Majiabang culture, e Songze culture, f Liangzhu culture, g Qianshanyang culture, h Guangfulin culture, i Maqiao culture.

Table 4 Statistical analysis of the relationship between ancient site distribution and DEM

hinterland of Hangjiahu Plain, indicating potential human activity in the region during that period48. This timeframe aligns closely with the rice cultivation phase associated with the Shangshan Culture48. During the Kuahuqiao period (Fig. 6b), rising sea levels led to the formation of extensive inland lakes and estuaries, which provided abundant freshwater and created favorable conditions for human habitation on coastal plains. In response to environmental changes and survival needs, ancient populations migrated to pre-mountainous areas and opted for settlement in plains. From the Majiabang culture onward (Fig. 6d), settlement choices increasingly focused on low-altitude plains. Majiabang culture sites were more widely distributed, spanning from Tianmu Mountain in the west to Kunshan in the east, and from Ningzhen in the north to Jiaxing in the south. This distribution was broader than that of the Kuahuqiao culture, with an increased number of sites as well16. During the Songze period, the Hangjiahu Plain underwent rapid formation through sediment deposition from the ancient Taihu Bay and tidal flats, resulting in an expansion of habitable plains suitable for human settlement (Fig. 6f)49. This geological transformation coincided with the peak of the Great Warming Period, which fostered the rice cultivation. The Late Neolithic period was marked by climatic fluctuations and frequent flooding caused by strong tidal upwelling50. In response, human activity expanded from river and lake edges and plains with dense water networks to higher-altitude areas such as hills (Fig. 6i).

Within the study area, surface relief ranged from a minimum of 0 m to a maximum of 167 m. During the Neolithic period, settlement sites were predominantly concentrated within a surface relief range of 0–50 m. In the mid to late stages, there was a notable increase in the number of sites located within the 0–10 m range. For instance, 90.6% of the archaeological sites associated with the Songze culture were situated within this range, compared to only 40% of sites from the early Hemudu culture period. The average surface relief was relatively high in the early Neolithic period, reaching 37.94 m during the Hemudu culture period. However, it generally decreased in the mid to late periods, with average values dropping below 15 m. Settlement sites tended to cluster in valleys and foothills with low topographic relief and abundant tributaries (Fig. 7). This orientation facilitated access to water and minimized risks from natural hazards such as mudslides, landslides, and floods. In flat terrain, water sources like rivers and lakes were more plentiful and readily accessible, and the flat landscape also facilitated the distribution and collection of animal resources. Conversely, in areas with high surface relief, steep terrain and scarce water sources posed greater challenges for inhabitants in accessing natural resources. The data suggest that in the mid Neolithic period, as cultural scales expanded and agriculture developed, humans increasingly preferred settling in flat areas. These environments offered optimal conditions for accessing and utilizing natural resources. In addition to ensuring basic material security, this preference also promoted cultural exchanges and dissemination among settlements50. It is worth noting that the maximum surface relief during the Maqiao culture period reached 174, which may have been influenced by large-scale flooding events caused by environmental changes in the Late Neolithic period.

Fig. 7: Relationship between spatial distribution and surface relief of ancient sites in Zhejiang Province.
figure 7

a Shangshan culture, b Kuahuqiao culture, c Hemudu culture, d Majiabang culture, e Songze culture, f Liangzhu culture, g Qianshanyang culture, h Guangfulin culture, i Maqiao culture.

In the study area, slope values ranged from a minimum of 0° to a maximum of 17.25°. The distribution of sites was most concentrated in regions with low slopes (Table 5). In the 3°–5° slope zone, no archaeological sites from the Kuahuqiao, Guangfulin, and Maqiao cultural periods have been identified, while a small number of sites were distributed across the other six periods. Areas with slopes exceeding 15° contained no sites from the Kuahuqiao, Liangzhu, Qianshanyang, or Guangfulin cultural periods, though a few were associated with the remaining periods (Table 5). As slope values increased, the number of settlements in each historical period clearly decreased, with the majority of sites concentrated in the 0–5° slope zone (Fig. 8). This indicates that during the Neolithic period, humans preferred areas with gentler slopes for settlement. Such terrain facilitated navigation, material transport, and the construction of stable settlements and dwellings. For agricultural societies, flat or gently sloping land was particularly advantageous, providing ideal conditions for planting and cultivating crops. Conversely, areas with steep slopes were prone to soil erosion and, due to Zhejiang Province’s location in the East Asian monsoon zone with concentrated precipitation and prolonged rainy seasons, exhibited poor water retention. These conditions made steep slopes more vulnerable to natural disasters such as mudslides and landslides during the rainy season50. Additionally, the increased costs and risks associated with building settlements in steep areas further diminished their suitability for habitation.

Fig. 8: Relationship between spatial distribution and slope of ancient sites in Zhejiang Province.
figure 8

a Shangshan culture, b Kuahuqiao culture, c Hemudu culture, d Majiabang culture, e Songze culture, f Liangzhu culture, g Qianshanyang culture, h Guangfulin culture, i Maqiao culture.

Table 5 Statistical analysis of the relationship between ancient site distribution and slope

Changes in water sources and ancient river channels also influenced human settlement choices. Previous studies of settlement sites in various watersheds worldwide consistently indicate that ancient populations preferred locations near lakes, rivers, seas, and springs5. The distance from oceans, rivers, and other water bodies is one of the most direct indicators of this5,50. A peak in the number of sites for all cultures appearing within the 5000 m buffer zone, supported by the model’s significant feature value for this buffer zone, highlights that this distance was a crucial threshold. This distance likely represented an optimal balance, allowing ancient peoples to maximize access to river resources while minimizing potential risks. In the Early Neolithic period, sites were predominantly located in valley basins within 1–5 km of rivers (Fig. 9). Early human populations reliant on foraging economies prioritized proximity to mountains and water sources for sustenance. During the Middle Neolithic, strong East Asian monsoon activity increased the likelihood of river diversions and flooding, leading to a shift toward settlements located farther from rivers. For example, 20.41% of Majiabang sites were situated more than 5 km from water sources, illustrating a significant proportion of distant settlements. Similarly, during the Liangzhu culture period, 17.18% of settlements were located beyond the 5000 m threshold. During the Maqiao culture period, the distribution range of sites expanded once again, and the proportion of settlements located within 1 kilometer of rivers also showed a notable recovery. However, the Qianshanyang, Guangfulin, and Maqiao cultures exhibited very few sites within 500 m of rivers, possibly due to environmental changes during the Liangzhu culture period that rendered riverine areas unsuitable for habitation41. Concurrently, the distribution of sites beyond 5 km from rivers increased, indicating an expansion of human activity into more diverse landscapes (Fig. 9).

Fig. 9: Statistics on the spatial and temporal distribution of ancient sites in Zhejiang Province in relation to the buffer zone of the water system.
figure 9

a Shangshan culture, b Kuahuqiao culture, c Hemudu culture, d Majiabang culture, e Songze culture, f Liangzhu culture, g Qianshanyang culture, h Guangfulin culture, i Maqiao culture.

Fluctuations in sea level directly impacted the living environments and production strategies of ancient humans. Rising or falling sea levels altered the availability of land suitable for habitation and agriculture, significantly affecting the distribution and quality of water resources. These changes in natural conditions prompted ancient humans to adapt to new environments and adjust their production methods with flexibility. During the Late Pleistocene, the coastal delta regions were primarily characterized by terrestrial deposits51. Although sea levels began to rise, they had not yet submerged the coastal deltas52. The warm, relatively dry climate and the widely exposed higher-elevation floodplain environment were conducive to the formation and distribution of early human settlements. By the early Holocene, sea levels continued to rise, and coastal delta regions transitioned into areas of marine-terrestrial transitional sedimentation52. Early settlement sites were significantly influenced by the continual rise in sea levels, with shorelines rapidly advancing inland over large areas51. River valley lowlands were submerged, leaving only a few hills and granitic outcrops as critical locations for early human habitation. Consequently, most sites were situated in mountainous basins alongside rivers, on slightly elevated terrain and at a greater distance from the advancing sea (Fig. 6a). The transition from the Shangshan Culture to the Kuahuqiao Culture may represent a response to the shift in climate from humid to arid conditions around 8.5 ka BP53, as most Kuahuqiao sites are located at lower elevations (Fig. 6b) closer to rivers compared to Shangshan sites (Fig. 10a). However, these early settlers still faced challenges from rapid sea level rises (Fig. 10a). Around 7.5 ka BP, rising sea levels caused flooding in the Qiantang River region, leading to the disappearance of the Kuahuqiao culture52.

Fig. 10: Distribution of Neolithic culture sites and coastline16 in Zhejiang Province.
figure 10

a 8000 BP coastline, b 7000 BP shoreline, c 6000 BP coastline, d 5000 BP coastline, e 4000 BP coastline.

Around 7.0 ka BP, the abundance of fresh water at estuaries and the fertile soil in delta regions attracted early human groups to migrate from the western mountainous areas to foothills and delta plains, where they engaged in agriculture and fisheries37. After 7.0 ka BP, the rate of sea level rise in the Zhejiang region slowed41. A large amount of sediment from the Yangtze River began to pile up, forming a delta in the Hangjiahu Plain41. The development of marshlands in this region, supported by stable sea levels, created favorable conditions for human habitation and agricultural production. Archaeological evidence shows that Majiabang culture sites were established on these elevated terrains (Fig. 10b). Concurrent with the sea’s recession from the previously eroded northern edges of the Zhejiang and Fujian hills, the Hemudu culture emerged in the Ningshao Plain.

After 6.0 ka BP, the sea levels had essentially stabilized, and the Yangtze River Delta region had largely formed52,53. This led to the emergence of a broader terrestrial landscape, facilitating the rapid expansion of ancient human activities in the region and the development of the Songze and Liangzhu cultures (Fig. 10c). Compared to the Majiabang sites, Songze culture sites exhibit a distinct outward expansion trend. By 6.0 ka BP, the trumpet-shaped morphology of Hangzhou Bay had formed, and the Qiantang River’s tidal bore began to develop. Sediment accumulation along both sides of the river elevated the land, with the western side being higher than the eastern, while the shoreline continued to shift46.

Around 5.6 ka BP, the freshwater transition of the ancient Taihu Bay, along with ongoing land formation in the Hangjiahu Plain, supplied critical freshwater and terrestrial resources that supported the expansion of the Songze-Liangzhu cultures in this region49. Around 5.0 ka BP, the estuary of the Qiantang River separated from that of the Yangtze River, forming the main body of Hangzhou Bay (Fig. 10d)49. The Liangzhu culture flourished in the Hangjiahu Plain of Zhejiang Province, where the number of archaeological sites increased significantly. Compared to the preceding Songze culture, the Liangzhu culture exhibited a clear tendency to expand into low-lying areas, with a higher proportion of settlements distributed in the water-netted plains (Fig. 10d). During the Liangzhu culture period, complex water management systems were constructed around the Liangzhu Ancient City, located in the western Tianmu Mountain foothills. The system includes dams such as Ganggongling and Shizishan. These facilities effectively intercepted floodwaters from the western and northern regions, preventing their incursion into the Hangjiahu Plain. This infrastructure provided a safe and stable hydrological environment, facilitating the sustainability of human settlements in the region. However, from the end of the Liangzhu culture to the Qianshanyang culture, the region experienced significant climatic fluctuations and environmental changes, including a global extreme cold event around 4.2 ka BP54. Through the analysis of δ¹³C from Shennong and Jiulong Caves near the Yangtze River Delta, combined with other paleoclimate and archaeological data, Zhang et al. identified an unusually humid climate between 4.3 and 4.0 ka B.P. This period was characterized by extensive rainfall in the middle and lower reaches of the Yangtze River, which triggered significant flooding events in the downstream delta region53. The impact of these environmental changes was particularly severe in the low-relief Hangjiahu Plain. Sustained and widespread flooding rendered most Liangzhu-period settlements uninhabitable, with some sites being completely inundated, as evidenced in the stratigraphic sequences at Yujiashan and Maoshan55. These conditions severely disrupted agricultural production and destabilized human-environment relationships56.

The δ¹³C records from Shennong and Jiulong Caves indicate the occurrence of a severe drought around 4.0 ka BP, with the Jiulong record suggesting that the driest conditions of the entire Holocene occurred approximately 3.8 ka BP53. Concurrently, the number of archaeological sites declined sharply during the Guangfulin-Qianshanyang cultural period (Fig. 10e), implying that this cultural decline may have been related to the drought. In addition, the summer monsoon weakened significantly around 4.0 ka BP, accompanied by rapid and frequent fluctuations in temperature, leading to frequent flooding caused by strong tidal surges57. The range of human activities expanded from the edges of rivers and lakes, densely populated plains to higher altitude areas such as hilly and mountainous regions, resulting in an increase in the average elevation of late-period sites.

Explaining variations in environmental selection across different cultural periods requires not only the objective perspective provided by modeling results, but also archaeological interpretation that considers livelihood changes and the interplay of social and geographic factors influencing settlement choices. During the period between 11.0 and 9.0 ka BP, coinciding with the early Holocene and a phase of rapid climatic warming, the Shangshan culture emerged. This era was characterized by a warm climate with abundant precipitation, which provided favorable conditions for early human subsistence. The inhabitants of this period primarily relied on hunting, fishing, and gathering, with archaeological evidence indicating the cultivation of small quantities of rice36. Notably, the rice from this period exhibits clear morphological signs of domestication, marking an important milestone in early agricultural development36. Simultaneously, tools associated with rice harvesting and processing, such as stone grinding wheels, became more prevalent during the Shangshan cultural phase. However, these tools remained relatively primitive in design and function58. Due to the limited productivity levels characteristic of the early Neolithic, settlements were typically situated along rivers near valleys with higher elevations and dense forest cover, which provided access to abundant wild fruits and vegetables (Fig. 6). This subsistence strategy, rooted in a hunter-gatherer economy supplemented by early agricultural practices, resulted in a dispersed population distribution and high settlement mobility59.

The Middle Neolithic period coincided with the Middle Holocene Warm Period, a time characterized by increased mean annual precipitation and a weakening of the Southern Oscillation and El Niño effects60. These climatic changes resulted in a reduction in catastrophic weather events, which in turn influenced settlement patterns. Settlements during this period were increasingly concentrated near coastal regions, with a notably sporadic yet widespread distribution around the Taihu Lake region (Fig. 6). Between 8.2 and 5.3 ka BP, cultures such as Kuahuqiao, Hemudu, Majiabang, and Songze primarily relied on hunting, fishing, and the collection of snails and mussels for subsistence. However, this period also witnessed significant advancements in rice cultivation and the domestication of animals, particularly pigs and dogs, alongside the utilization of a diverse range of aquatic plant resources61. Li Yingfu’s research on prehistoric archaeological sites in the lower reaches of the Yangtze River provides valuable insights into this transitional period. Through the analysis of changes in the quantity, material, type, and form of hunting and agricultural tools, combined with data from plant and animal remains, his findings suggest that the transition from the Kuahuqiao culture to the Hemudu culture was characterized by a predominantly hunter-gatherer economy supplemented by small-scale rice agriculture62. From the Majiabang to the Songze cultures, there was a gradual shift toward an agricultural economy, with rice cultivation expanding and the hunter-gatherer economy declining in prominence. By the time of the Songze Culture period, rice emerged as a primary food resource for human communities42. The demographic expansion likely served as a crucial driver for establishing rice-based subsistence systems in the lower Yangtze River region during this period42. This transformation indicates that rice agriculture had developed sufficient capacity to sustain the population’s subsistence needs. This economic evolution reflects broader societal and technological developments during the Middle Neolithic period62. Despite the growing importance of agriculture, domesticated mammals, primarily pigs and dogs, accounted for less than 30% of the total mammal population, with wild boars still constituting a significant proportion. Fishing and hunting remained the primary means of obtaining animal resources, while domesticated animal husbandry developed slowly and remained small-scale63. Under the influence of the Holocene warming period, the favorable living environment led to an expansion in the number of settlements, an increase in population, and a heightened demand for agricultural development. As a result, settlements progressively shifted toward low-altitude plains and marshy areas, where environmental conditions were more suitable for large-scale rice cultivation. The expansion of rice farming into coastal plains was driven by the demand for extensive arable land, facilitated by the favorable climatic and hydrological conditions of the period.

In the late Middle Holocene, plantation agriculture (particularly rice cultivation) became highly developed, driving significant population growth63. This period saw the flourishing of the Liangzhu culture, marked by advancements in agricultural practices and socio-economic transformations. As settlements expanded and populations increased, the availability of wild food resources declined, prompting a fundamental shift towards an agricultural economy. Rice production technology became increasingly sophisticated, enabling large-scale and highly productive cultivation, which established rice as the primary staple food for the population. Archaeological evidence, including fruit remains such as gourds, melons, and acorns, further highlights the diversification of agricultural practices during this period64. Livestock farming also experienced rapid development, becoming the predominant source of animal resources. The proportion of domesticated animals increased dramatically, exceeding 50% of the total mammal population. However, the development of livestock farming was spatially uneven, with higher proportions of domesticated animals observed in regions closer to the political and cultural centers of the Liangzhu regime65. The concurrent development of various economic components, including rice agriculture, livestock breeding, and residual hunting-gathering practices, shaped settlement patterns during the Liangzhu culture period. Settlements were preferentially located in areas of lower elevation and on slopes conducive to agricultural cultivation. This period also witnessed a decline in vegetation cover, reflecting the intensification of human activities. The overall spatial distribution of settlements became more extensive, and the adaptability of human populations to environmental conditions improved significantly.

From the end of the Liangzhu culture to the Qianshanyang culture, the region experienced widespread flooding, severe climate fluctuations, and a shift towards arid conditions around 4.0 ka BP52. After 4.0 ka BP, a tidal creek system developed across the Hangjiahu Plain, characterized by an intricate network of tidal channels. The region has become increasingly vulnerable to marine intrusions and storm surges, leading to a reduction in habitable land area61. Concurrently, the water area within the Taihu Lake Basin expanded, leading to intensified swamp and wetland conditions15. This period also witnessed a reduction in evergreen deciduous broad-leaved forests and a marked transition to a drier and colder climate55,66. In response to these environmental pressures, the subsistence strategies of the region’s inhabitants underwent significant adaptation. During the Guangfulin, Qianshanyang, and Maqiao cultural periods, both farming and livestock rearing experienced a notable decline. By the Maqiao cultural period, the proportion of domesticated animals had fallen to 30% or less, reflecting a significant reduction in livestock rearing67. Fishing became the primary source of animal resources, and the main economic form shifted to hunting and gathering activities, with a reduction in rice farming67. These economic and environmental changes also influenced settlement patterns. Populations relocated from low-lying areas along rivers, lakes, and plains with dense water networks to higher elevations and regions with greater surface relief, such as mounds and hillslopes. This shift reflects a strategic adaptation to the changing environmental conditions and the demand to mitigate the impacts of flooding and wetland expansion.

In summary, the degree of influence exerted by environmental factors varies significantly across different cultural stages, reflecting the dynamic changes in human adaptation strategies and patterns of environmental utilization. During periods of marine transgression, early Neolithic settlements were primarily located on elevated terrain near rivers, supporting a subsistence economy based on hunting and gathering. With the receding coastline, mid-Neolithic communities expanded into alluvial plains, where favorable conditions facilitated the development of rice agriculture to sustain growing populations. By the late Neolithic, climatic cooling and frequent flooding led to a shift in settlement patterns toward higher elevations, resulting in a decline in rice cultivation.