Introduction

Groundwater, one of the most widely distributed freshwater resources globally, plays an irreplaceable role in ensuring safe drinking water, supporting regional economic development, and advancing ecological sustainability. In arid and semi-arid regions where surface water is scarce, groundwater frequently serves as the sole water source for domestic and economic use1,2,3,4. However, the combined impacts from industrialization, agriculture, and other anthropogenic activites have compromised groundwater quality, presenting serious risks to water safety and public health5,6. Among these threats, arsenic (As) and fluoride (F) pollution in groundwater has become increasingly prominent and is now recognized as a widespread global environmental problem7. Current estimates suggest that approximately 150 million people are directly affected by As in drinking water globally, while up to 260 million are exposed to high-F environments8,9. With the continued identification of contaminated areas and the availability of improved monitoring data, the affected population continues to grow. Several countries, including India10, Pakistan11, China12, and Mexico13, are currently facing severe As and F groundwater contamination, which poses a significant health threat to local populations. In China, elevated As and F in groundwater are concentrated in arid and semi-arid regions such as the Guanzhong Basin6,14, Datong Basin15, Tumochuan Plain16, and Weining Plain5. The enrichment of As and F in groundwater is controlled by both the natural environment and human activities. Regional geology (lithology, mineralogy), climate, and groundwater residence time are the primary controls on As and F migration and enrichment in natural groundwater17,18,19. Despite extensive research on the factors controlling groundwater arsenic and fluoride, significant gaps persist in understanding their co-occurrence mechanisms and anthropogenic drivers, particularly within human-altered, large-scale irrigation systems. Therefore, clarifying the enrichment mechanisms of As and F contamination in groundwater is of great significance for ensuring the safety of the water environment and sustainable development.

To identify pollutant sources and elucidate enrichment mechanisms, integrated approaches combining hydrochemical analysis (e.g., Gibbs diagrams, ion end-member diagrams, and PHREEQC simulations), multivariate statistics, and isotope tracing technology have been widely applied3,14,19,20,21,22,23. However, traditional multivariate statistical methods such as principal component analysis (PCA) and cluster analysis (CA) can only provide preliminary insights into differences between reference vectors, demonstrating certain limitations24. In the investigation and analysis of arsenic and fluoride contamination in groundwater, the abundance of samples and complexity of data structure require methods capable of analyzing multidimensional groundwater quality data while providing visual results to support decision-making25. The Self-Organizing Map (SOM), an efficient unsupervised machine learning neural model, is widely applied for data visualization and attribute exploration26. By organizing data in a grid topology, SOM can visually represent relationships between groundwater samples and identify distinct water quality patterns, which aids in tracing pollutant sources and the dominant geochemical processes27,28. When integrated with K-means to further cluster SOM neurons, it can be effectively utilized for evaluating and classifying complex hydrogeochemical data. Therefore, groundwater samples with similar hydrochemical characteristics can be effectively classified by combining the SOM with the K‑means method, and the correlation between arsenic and fluoride in groundwater can be discerned.

Moreover, the impact of As and F pollution in groundwater on human health has also attracted widespread attention. Long-term consumption of As-contaminated groundwater may lead to skin lesions and damage to the respiratory and nervous systems, and may also induce cancers (e.g., kidney and liver cancer)16,29. Insufficient F intake increases the risk of dental caries and osteoporosis, whereas excessive intake may lead to dental and skeletal fluorosis30. The World Health Organization (WHO) guideline for drinking water quality, established to protect human health, sets a limit of 0.01 mg/L for arsenic and recommends a daily fluoride intake of 0.5–1.5 mg/L. Therefore, health risk assessment models are commonly used to evaluate potential impacts of groundwater pollutants contaminants on human health14,31,32,33. Nonetheless, these traditional deterministic models are sensitive to sample limitations and individual variability, leading to randomness and uncertainty in key parameters, which may result in deviations in assessment results34. Monte Carlo simulation offers distinct advantages: (1) it inherently handles multi-source random variables without requiring discretization to resolve probability distributions, with computational accuracy independent of system state dimensionality; (2) through the implementation of reduction factors, it enables dynamic adaptation to scenario changes, thereby significantly enhancing the efficiency of risk prediction. Therefore, in the present study, a predictive model that combines Monte Carlo simulations with the traditional deterministic model is used to quantify the potential impacts of groundwater pollutants on human health via different exposure pathways (drinking water ingestion and dermal contact)35,36,37. In addition, the health risk distribution map was drawn by using the deterministic model and GIS technology to visualize the regional differences of health risk. The coupling model can provide a robust and accurate prediction of population-level risks, and the results provide an important scientific basis for ensuring drinking water safety and protecting human health.

The Jiaokou Irrigation District, one of the ten major irrigation districts in the Guanzhong Basin, has been operational for over 60 years since its completion in 1960 and is an important grain production area in the Guanzhong Plain2,38. Groundwater is the primary drinking water source in this district; its quality directly affects the health of local residents and socio-economic development. Monitoring indicates the presence of As and F contamination in the groundwater in the study area, with potential implications for both groundwater quality and public health. Although previous studies have indicated severe F contamination in the groundwater of the district39,40,41, systematic studies on As-F co-contamination mechanisms and associated health risks are lacking.

Therefore, based on Self-organizing map (SOM), K-means clustering, isotope tracing, and human health risk assessment models coupled with Monte Carlo simulations, this study aims to: (1) analyze the spatial distribution of As and F in groundwater in the Jiaokou Irrigation District; (2) elucidate the correlations between As and F and their co-enrichment mechanisms; and (3) assess exposure risks of As and F contamination in groundwater to adults and children. The findings will help decision-makers implement targeted measures to mitigate As and F contamination in groundwater and ensure the safety of drinking water for local communities.

Materials and methods

Study area

The Jiaokou Irrigation District, which extends from 34°30′7″N to 34°52′37″N in latitude and from 109°12′40″E to 110°10′1″E in longitude, is situated in the eastern part of the Guanzhong Basin, Shaanxi Province, China. It is a large, multi-level, dam-free, electric-pumping irrigation area that uses the Wei River as its primary water source. The irrigation area is bounded to the east and west by the Luo and Shichuan rivers, and to the north and south by the Wei River and Pucheng Luyang Lake, respectively. Its total area is 1024.36 km2, extending approximately 48.6 km east–west and 31.9 km north-south [Fig. 1(a)]42.

Fig. 1
Fig. 1
Full size image

Maps of the study area showing: (a) geographical location and distribution of groundwater sampling sites, (b) land-use map, (c) stratigraphic profile. Panels (a) and (b): ArcGIS 10.8.1 (http://www.esri.com), Panel (c): CorelDRAW X8 18.1.0.661 (https://www.coreldraw.com).

Over more than 60 years of irrigation development, the district has developed a dense network of ditches and canals. Land use is dominated by cultivated land (84.6%), followed by orchards (8.3%) [Fig. 1(b)]. The area has a warm-temperate semi-arid monsoon climate, with an average annual temperature of 14.12 ℃, annual sunshine duration of 1842 h, and average annual precipitation and evaporation of 527.2 mm and 1220.9 mm, respectively38,42.

Topographically, the area is primarily composed of the Wei River floodplain, first terrace, second terrace, and loess tableland, forming a “dustpan” shape sloping from the northwest, north, and northeast towards the central and southern parts of the irrigation area. Controlled by the Qinling zonal tectonic system, the main aquifers are Quaternary terrestrial sediments that are widely distributed but unevenly thick, including lacustrine (Ql), alluvial (Qal), and aeolian (Qeol) deposits. Typical loess deposits occur in the north due to aeolian processes [Fig. 1(c)]2,42,43. The aquifer in the study area is continuous and forms a complete system, with groundwater flowing from northwest to southeast. Recharge occurs primarily from precipitation, irrigation infiltration, and canal leakage. Influenced by the Kouzhen-Guanchi fault zone, groundwater occurs mainly as aeolian loess pore-fissure water in the northern loess plateau and loose-rock pore water in the alluvial layer31.

Sample collection and analysis

This research conducted a survey in the Jiaokou Irrigation District during the summer (August) of 2023. The sampling was designed to cover the entire irrigation district, considering factors such as geomorphology, groundwater flow dynamics, and human activities. A total of 51 water samples were collected, consisting of 48 from phreatic wells (including both domestic and irrigation wells, with depths ranging from 5 to 50 m) and 3 from drainage canals [Fig. 1(a)]. In this irrigation district, the primary function of the drainage canals is agricultural drainage, with their water mainly sourced from the discharge of shallow groundwater. Consequently, the hydrochemical characteristics of these canal waters largely reflect the properties of the local groundwater. Therefore, the drainage canal samples were analyzed together with the phreatic well samples in this study. Sampling locations were recorded using a handheld GPS device. Prior to collection, all wells were purged by continuous pumping for 10–15 min to remove stagnant water. A portable water-quality meter (AZ86031) was employed for in-situ measurements of pH, electrical conductivity (EC), and temperature until parameters stabilized, ensuring the collected samples were representative of the aquifer and actual usage conditions. Samples were then collected in pre-cleaned 1.5 L polyethylene bottles, which had been rinsed three times with distilled water followed by at least three rinses with the sample water. The water samples intended for arsenic determination were acidified with 1% hydrochloric acid. The bottles were completely filled to minimize air bubbles, immediately sealed with caps, and labeled. All samples were stored at 4 °C and transported to the laboratory within one week for analysis.

All major cations, anions, and trace elements (As、F) were determined by Shaanxi Geology and Mineral 2nd Engineering Survey Institute Inspection Detection Co., Ltd. K+ and Na+ were determined by flame atomic absorption spectrometry (detection limits: 0.132 mg/L and 0.354 mg/L, respectively). Ca2+, Mg2+, SO42−, CO32−, and HCO3 were measured by titration, while Cl⁻ was determined by silver titration, all with a detection limit of 1.0 mg/L. NO3 was analyzed by spectrophotometry (detection limit: 0.2 mg/L). Arsenic was measured using atomic fluorescence spectrometry (AFS-922; detection limit: 0.001 mg/L), and F by ion chromatography (detection limit: 0.2 mg/L). Total dissolved solids (TDS) were determined using the gravimetric method. Stable hydrogen and oxygen isotopes (δD and δ18O) were analyzed with a gas isotope ratio mass spectrometer (Thermo Scientific 253 Plus) coupled with an elemental analyzer (Thermo Fisher Flash 2000HT) via high-temperature pyrolysis and reduction. The analytical accuracy was 1.0‰ for δD and 0.2‰ for δ¹⁸O.

Analytical quality was assessed using the charge-balance error (CBE) [Eq. (1)]3,44. The CBE values for all samples in this study were within ± 5% (range: − 2.91% to 2.99%), indicating reliable results. CBE was calculated as follows:

$$CBE(\% )=\frac{{\sum {Nc - \sum {Na} } }}{{\sum {Nc+\sum {Na} } }} \times 100$$
(1)

where Nc and Na are the milligram equivalent concentrations of cations and anions (meq/L), respectively.

Self-organizing map and K-means clustering algorithms

The Self-organizing map (SOM), initially proposed by the Finnish scholar Kohonen45, is an efficient unsupervised machine learning neural model. Through unsupervised training, SOM can be widely applied to complex data tasks such as classification, estimation, and prediction46. By mapping high-dimensional data onto a two-dimensional hexagonal neuron grid, SOM achieves dimensionality reduction while preserving the topological structure of the input data, thereby enhancing data visualization and analytical capabilities24,26.

K-means is a classic clustering method that partitions the data space into k clusters through an iterative process, and can be used to further classify the neurons after SOM training47. The optimal number of clusters is determined based on the minimum Davies-Bouldin index (DBI), where a smaller DBI value indicates greater distinctions between different categories and better clustering performance47,48. The combination of SOM and K-means helps reduce data complexity and enables superior visual analysis.

In this study, the SOM model was implemented using Matlab R2024a to classify groundwater samples. The optimal number of output neurons is determined by the formula \(\text{5}\sqrt{\text{n}}\) (where n is the number of samples)49. Based on this, a neural network was trained using a 5 × 7 map size, and the accuracy of the SOM was evaluated using the quantization error (QE) and topographic error (TE)27,28. The key steps for clustering in the SOM neural network are as follows:

  1. 1.

    Input matrix and data standardization, \(\text{x}\) is the input variable.

$$X={({x_1},{x_2},{x_3},...,{x_n})^T}$$
(2)
  1. 2.

    The competitive learning is performed based on the Euclidean distance.

$${D_j}(x)=\sqrt {\sum\nolimits_{{i=1}}^{n} {{{({x_i} - {w_{ij}})}^2}} }$$
(3)

Where Dj(x) is the distance between xi and the node neuron j; n is the dimensionality of the input variable; wij represents the connection weight of node neuron j.

  1. 3.

    Calculating the spatial location of the winning neuron.

$${T_{j,I(x)}}=\exp ( - \frac{{S_{{j,I(x)}}^{2}}}{{2{\sigma ^2}}})$$
(4)

Where Tj, I(x) denotes the updated weight coefficient; Sj, I(x) is the distance between node neuron j and I(x); I(x) represents the winning node;σ is the neighborhood size.

  1. 4.

    Weight adjustment.

$${w_{ij}}(t+1)={w_{ij}}(t)+{\beta _i}{h_i}(t)({x_i} - {w_{ij}})$$
(5)

Where t is training times; βi is the learning rate, which decreases exponentially with the number of training times.

  1. 5.

    K-means clustering.

$$DBI=\frac{1}{N}\sum\nolimits_{{i=1,i \ne j}}^{N} {\hbox{max} \frac{{{{\overline {S} }_i}+{{\overline {S} }_j}}}{{||{w_i} - {w_j}|{|_2}}}}$$
(6)

Where N is the number of clusters; \(\overline{{\text{S}}_{\text{i}}}\) is the average Euclidean distance from samples in the i-th cluster to their centroid; \({\text{|}\text{|}{\text{w}}_{\text{i}}\text{-}\text{}{\text{w}}_{\text{j}}\text{||}}_{\text{2}}\) is the Euclidean distance between the centroids of cluster i and cluster j.

Saturation index (SI)

The saturation index (SI) is a primary method for determining and investigating the saturation state of minerals in groundwater and is widely applied in hydrogeochemical research27. It is defined as:

$$SI=\lg \frac{{IAP}}{K}$$
(7)

Where IAP is the ion activity product of the relevant mineral; K is the equilibrium constant of the mineral dissolution reaction. In this study, the PHREEQC software was employed to calculate the saturation indices of various minerals in the groundwater samples. Minerals with SI between − 0.2 and 0.2 are in quasi-equilibrium42.

Health risk assessment

Humans are exposed to harmful substances from groundwater primarily via ingestion (i.e., drinking) and dermal contact. Prolonged excessive intake of As and F can cause various adverse health effects29. This study applied the USEPA health risk assessment model to evaluate potential health risks posed by As and F to both children and adults1,5. The chronic daily absorbed dose via the two pathways was calculated as follows:

$$CDI=\frac{{Cw \times IR \times EF \times ED}}{{BW \times AT}}$$
(8)
$$CDD=\frac{{DA \times SA \times EV \times EF \times ED}}{{BW \times AT}}$$
(9)
$$DA=Kp \times Cw \times T \times 0.001$$
(10)
$$SA=239 \times {H^{0.417}} \times B{W^{0.517}}$$
(11)

where CDI and CDD are the chronic daily absorbed doses for oral intake and skin contact, respectively (mg/kg·d); DA is the absorbed dose via skin contact (mg/cm2); Cw is pollutant concentration in groundwater (mg/L); and SA is the exposed skin surface area (cm2).

The total non-carcinogenic risk is the sum of the non-carcinogenic risk hazard coefficients of the two pathways of exposure to groundwater contamination, calculated as follows:

$$HQ{\text{oral}}=\frac{{CDI}}{{RfD}}$$
(12)
$$HQ{\text{dermal}}=\frac{{CDD}}{{RfD}}$$
(13)
$$HQi=HQ{\text{oral}}+HQ{\text{dermal}}$$
(14)
$$THQ=\sum\limits_{{i=1}}^{n} {HQi}$$
(15)

where HQ is the non-carcinogenic hazard factor; RfD is the reference dose for both oral intake and skin contact routes (mg/kg·d).

Additionally, the carcinogenic risk to the human body may be caused by As. The formulas for calculating the carcinogenic risk through oral ingestion and skin absorption contact are as follows:

$$CR{\text{oral}}=CDI \times SF$$
(16)
$$CR{\text{dermal}}=CDD \times SF$$
(17)
$$TCR=CR{\text{oral}}+CR{\text{dermal}}$$
(18)

Where, CR represents carcinogenic risk; SF is the slope factor for carcinogenic pollutants (mg/kg·d)−1.

Based on Guidelines for Health Risk Assessment of Groundwater Pollution in China, the RfD values of As and F are 3.0 × 10− 4 and 0.04, respectively, and the SF value for As is 1.5. Other parameters are shown in Table S1 (in the Supplementary Materials). HQ values greater than 1 indicate a high non-carcinogenic risk, and CR values greater than 1 × 10− 4 indicate that the carcinogenic risk is unacceptable.

Monte carlo model

Deterministic health risk models assume constant parameter values; however, substantial variability in population-related parameters during health risk characterization can introduce significant uncertainty in the assessment results34. To quantify this uncertainty, a Monte Carlo simulation with 10,000 iterations was performed using Crystal Ball software, with the random seed set to 2026. The relevant data are summarized in Table S2. The probability distributions of risk values were obtained after multiple iterations of the simulation, enabling a more robust assessment of As and F health risks in groundwater35,36,50. Additionally, to identify the impact of parameters on the results of health risk assessment, a sensitivity analysis was conducted based on the Spearman rank correlation coefficient36. The larger the absolute value of the correlation coefficient, the greater the impact of the variable on the health risk14.

The research routes of this study was illustrated in Fig. 2.

Fig. 2
Fig. 2
Full size image

The framework of research routes in the present study.

Results and discussion

Groundwater hydrochemical characteristics

The statistical results of groundwater hydrochemical indicators in the Jiaokou Irrigation District were presented in Table S3 (51 in total, JK1-JK51). Groundwater pH values ranged from 7.80 to 8.63, with a mean of 8.19, indicating weak alkalinity. Based on TDS, groundwater salinity could be classified as freshwater (TDS < 1000 mg/L), brackish water (1000 mg/L ≤ TDS ≤ 3000 mg/L), or saline water (TDS > 3000 mg/L)42. TDS values in the study area ranged from 988 to 11,802 mg/L, with an average of 3185.88 mg/L. Brackish and saline waters accounted for 54.90% and 43.14% respectively, which collectively indicate a significant groundwater salinization issue in the study area.

The average concentrations of major ions follow the order: SO42− > HCO3 > Cl > NO3 > CO32− > F for anions and Na+ > Mg2+ > Ca2+ > K+ for cations. Box plots of the concentration distribution for various groundwater indicators are shown in Fig. S1 (in the Supplementary Materials). The average As concentration in groundwater is 0.0047 mg/L, with a maximum of 0.017 mg/L. As concentrations in 11.8% of samples exceed the threshold (0.01 mg/L). For F, the maximum concentration is 4.63 mg/L, with an average of 1.58 mg/L. F concentrations are below 0.5 mg/L in only 1.96% of samples, whereas they exceed 1.5 mg/L in 43.75% of samples. According to the Chinese Standards for Irrigation Water Quality (GB 5084 − 2021), the recommended limits for As and F in irrigation water for dryland crops are 0.1 and 3.0 mg/L, respectively. Fluoride concentrations in the study area were found to be well above the permitted level, indicating that the groundwater is therefore unsuitable for agricultural irrigation, and thereby threatens local food security and human health.

The coefficient of variation (CV) is a statistical measure used to assess spatial heterogeneity in ion concentrations. According to CV thresholds, variability can be categorized as weak (< 10%, uniform distribution), moderate (10–100%, significant spatial variation), or strong (> 100%, highly discrete distribution)42. In this study, the CV values for As and F in groundwater were 80.56% and 53.01%, respectively, both falling within the moderate variability range. This indicates that groundwater in the Jiaokou Irrigation District is likely influenced by anthropogenic agricultural activities, leading to local enrichment of As and F and resulting in their spatially heterogeneous distribution51.

Cluster analysis based on SOM

Statistical results in Table S3 reveal significant differences in anions and cations within the groundwater. Based on this, SOM and K-means methods were employed to perform unsupervised machine learning classification on 13 variables from 51 groundwater samples27.The quantization error (QE) and topographic error (TE) were 0.368 and 0.187, respectively, confirming that the model reliably preserved the topological structure of the data and thereby effectively illustrating the disparities among the samples. As shown in Fig. 3(a) (red and blue shades represent high and low values, respectively), correlations among variables can be identified by comparing color gradients across the component planes. Similar colors suggest a positive correlation, with higher similarity indicating a stronger relationship26. Based on the minimum DBI, the groundwater samples are categorized into three distinct groups [Fig. 3(b)]. Cluster 1 comprises 12 samples characterized by elevated concentrations of TDS, Na+, Mg2+, Cl, and SO42− (Table S4), and is classified as the Cl·SO4-Na type [Fig. 3(c)]. Cluster 2 includes 14 samples showing higher levels of CO32−, F, As, and pH (Table S4), belonging to the HCO3-Na type [Fig. 3(c)]. Cluster 3 consists of 25 samples with notable concentrations of K+, Ca2+, HCO3, and NO3 (Table S4), categorized as the HCO3-Ca type [Fig. 3(c)].

Fig. 3
Fig. 3
Full size image

Clustering results of groundwater samples: U-matrix and component planes of the 13 groundwater variables in the SOM (a), distribution map of the three groundwater clusters (b), and the Piper diagram (c).

Isotopic characteristics

Stable isotopes of hydrogen (δD) and oxygen (δ18O) were used to identify groundwater recharge sources51. δD values ranged from − 71.92‰ to − 55.98‰ (mean: −63.58‰), and δ18O values ranged from − 9.37‰ to − 7.80‰ (mean: −8.59‰) (Table S5). The CV values for δD and δ18O were − 0.07 and − 0.05, indicating minimal variability in hydrogen and oxygen.

The δ18O vs. δD relationship (Fig. 4) yielded a fitting line of hydrogen and oxygen that was nearly parallel to the local meteoric water line (LMWL: δD = 7.49 δ18O + 6.13), with most groundwater samples plotting below the LMWL. This pattern suggested that groundwater was primarily recharged by atmospheric precipitation and that evaporation contributed to the enrichment of δD and δ18O in groundwater20.

Fig. 4
Fig. 4
Full size image

The relationship of δ18O vs. δD of groundwater.

Spatial distribution of As and F in groundwater

The spatial distributions of As and F are shown in Fig. S2. Elevated As concentrations occurred mainly in the northern and southeastern parts of Linwei District and the central part of Pucheng County, with low concentrations distributed in other areas. The area where As exceeded regulatory limits was 36.99 km2, accounting for 3.61% of the study area. Low-F groundwater (< 0.5 mg/L) occurred in a small area in Yanliang District in the western part of the irrigation area (1.70 km2). High-F groundwater (> 1.5 mg/L) mainly occurred in Linwei District, Pucheng County, and Dali County, covering 557.53 km2 (54.43% of the total area), indicating that more than half of the groundwater was contaminated with fluoride. Overall, the spatial distributions of As and F were broadly consistent, suggesting that they had similar geochemical occurrence mechanisms. Furthermore, elevated concentrations of arsenic and fluoride are predominantly distributed within cultivated land [Figs. 5(a) and (b)].

Fig. 5
Fig. 5
Full size image

Spatial variation of As and F values under different land use types (a) and (b), Map of As/F exceedance points over salinity contours (c), Relationship plot of TDS and (NO3+Cl)/HCO3(d). Panels (a), (b), and (c) were prepared using Surfer 23 (http://www.GoldenSoftware.com/products/surfer.com).

Causes of As and F enrichment in groundwater

Correlation analysis based on SOM

The SOM clustering results [Fig. 3(a)] indicate strong correlations among As, F, CO32−, and pH in the groundwater of the study area, while TDS exhibited a strong positive correlation with Na+, Mg2+, Cl, and SO42−. As and F showed a moderate correlation (r = 0.63) and were both moderately correlated with pH and CO32−. This indicating that they may be influenced by similar environmental factors26. The Pearson correlation analysis also confirms these results (Fig. S3).

Moreover, among the 51 groundwater samples analyzed, six exceeded the permissible limit for As and 24 exceeded it for F. Five samples exceeded the standards for both As and F⁻ simultaneously [Fig. 6(a)], accounting for 20% of the samples that exceeded the standard threshold. This further confirmed the potential for co-occurrence of As and F in groundwater and their possible geochemical association.

Fig. 6
Fig. 6
Full size image

Relationship between As and F concentrations in groundwater of the study area (a), Relationship between As and F concentrations with pH value (b1) and (b2), Saturation indices of minerals (c).

Acid-base conditions

Groundwater in the Jiaokou Irrigation District is weakly alkaline, with an average pH of 8.19. As shown in Figs. 6 (b1) and (b2), pH exhibits a positive correlation with As and F concentrations; both increase gradually with rising pH. In groundwater with pH values between 7 and 9, As predominantly exists as arsenate or arsenite, which can be adsorbed by positively charged minerals and clay minerals, such as hematite, goethite, kaolinite, and illite, within the aquifer media16,52. Higher pH reduces the positive surface charge of these minerals, weakening As adsorption and facilitating its enrichment into groundwater53.

Fluoride is relatively more active in alkaline groundwater. Elevated pH reduces Ca2+ activity, inhibiting fluorite (CaF2) precipitation and facilitating the enrichment of F in groundwater. Furthermore, due to the similar ionic radii of OH and F, weakly alkaline conditions promote the substitution of OH for F in fluorine-bearing minerals such as biotite, fluorapatite, and amphibole [Eqs. (19)–(21)], further contributing to fluoride enrichment54,55. Thus, weak alkalinity favors the co-enrichment of As and F in groundwater.

$${\text{KM}}{{\text{g}}_{\text{3}}}{\text{[AlS}}{{\text{i}}_{\text{3}}}{{\text{O}}_{{\text{10}}}}{\text{]}}{{\text{F}}_{\text{2}}}{\text{ + 2O}}{{\text{H}}^{\text{-}}}{\text{ }} \to {\text{ KM}}{{\text{g}}_{\text{3}}}{\text{[AlS}}{{\text{i}}_{\text{3}}}{{\text{O}}_{{\text{10}}}}{\text{][OH}}{{\text{]}}_{\text{2}}}{\text{ + 2}}{{\text{F}}^{\text{-}}}$$
(19)
$${\text{C}}{{\text{a}}_{\text{5}}}{\text{ (P}}{{\text{O}}_{\text{4}}}{{\text{)}}_{\text{3}}}{\text{F + O}}{{\text{H}}^{\text{-}}} \to {\text{ C}}{{\text{a}}_{\text{5}}}{{\text{(P}}{{\text{O}}_{\text{4}}}{\text{)}}_{\text{3}}}{\text{OH + }}{{\text{F}}^{\text{-}}}$$
(20)
$${\text{NaC}}{{\text{a}}_{\text{2}}}{{\text{[Mg, Fe, Al]}}_{\text{5}}}{{\text{[Al, Si]}}_{\text{8}}}{{\text{O}}_{{\text{22}}}}{{\text{F}}_{\text{2}}}{\text{ + 2O}}{{\text{H}}^-} \to {\text{ NaC}}{{\text{a}}_{\text{2}}}{{\text{[Mg,Fe,Al]}}_{\text{5}}}{{\text{[Al,Si]}}_{\text{8}}}{{\text{O}}_{{\text{22}}}}{{\text{[OH]}}_{\text{2}}}{\text{ + 2}}{{\text{F}}^{\text{-}}}{\text{ }}$$
(21)

Water-rock interaction

The Gibbs diagram distinguishes three hydrochemical evolution domains: evaporation-concentration, water-rock interaction, and atmospheric precipitation56. As shown in Fig. S4, samples with co-enrichment of As and F are mainly distributed in the evaporation-concentration and water-rock interaction fields, indicating that both processes contribute to the enrichment of As and F. High-F groundwater is more strongly influenced by evaporation, whereas high-As groundwater is mainly influenced by water-rock interactions. Additionally, anthropogenic activities may also exert a certain promoting effect on the enrichment of As and F.

Saturation index (SI) analysis, by elucidating the interactions between groundwater and minerals, serves as an effective method for identifying the dominant geochemical environment27. Saturation indices (SI) for minerals including anhydrite, calcite, dolomite, gypsum, fluorite, halite, hematite, and goethite were calculated using PHREEQC to analyze the influence of water-rock interaction on groundwater chemical components. As shown in Fig. 6(c), calcite, dolomite, hematite, and goethite are in a saturated state (SI > 0.2), while other minerals are in an unsaturated state (SI < 0.2). Calcite and dolomite precipitation reduces the Ca2+ concentration in groundwater, thereby promoting the enrichment of F [Eqs. (22)–(23)]54. Under alkaline conditions, As adsorbed on hematite and goethite undergoes desorption, promoting the enrichment of As in the groundwater. In addition, goethite (FeOOH) dissolves under specific reducing conditions, and the As in the mineral is released into groundwater in its free form [Eq. (24)]16.

$${\text{Ca}}{{\text{F}}_{\text{2}}}{\text{ + 2NaHC}}{{\text{O}}_{\text{3}}}{\text{ }} \to {\text{ 2}}{{\text{F}}^{\text{-}}}{\text{ + 2N}}{{\text{a}}^{\text{+}}}{\text{ + CaC}}{{\text{O}}_{\text{3}}}{\text{ + }}{{\text{H}}_{\text{2}}}{\text{O + C}}{{\text{O}}_{\text{2}}}{\text{ }}$$
(22)
$${\text{Ca}}{{\text{F}}_{\text{2}}}{\text{ + }}{{\text{H}}_{\text{2}}}{\text{O + C}}{{\text{O}}_{\text{2}}}{\text{ }} \to {\text{ CaC}}{{\text{O}}_{\text{3}}}{\text{ + 2}}{{\text{F}}^{\text{-}}}{\text{ + 2}}{{\text{H}}^{\text{+}}}$$
(23)
$${\text{8(FeOOH)--As}}{{\text{O}}_{\text{4}}}^{{{\text{3-}}}}{\text{+C}}{{\text{H}}_{\text{3}}}{\text{COOH+14}}{{\text{H}}_{\text{2}}}{\text{C}}{{\text{O}}_{\text{3}}}{\text{ }} \to {\text{8F}}{{\text{e}}^{{\text{2+}}}}{\text{+16HC}}{{\text{O}}_{\text{3}}}^{{\text{-}}}{\text{+12}}{{\text{H}}_{\text{2}}}{\text{O+As}}{{\text{O}}_{\text{4}}}^{{{\text{3-}}}}$$
(24)

To further identify the types of rock weathering sources in the study area, end-member plots of N(HCO3 /Na+), N(Mg2+/Na+), and N(Ca2+/Na+) were prepared57. Fig. S5 shows that groundwater samples from the study area plot between the silicate and evaporite weathering fields, away from the carbonate field. This suggests that As and F enrichment is driven by the weathering of silicate and evaporite minerals. The dissolution of silicates (e.g., plagioclase, albite) in the aquifers under the action of CO2 increases HCO3 concentrations [Eqs. (25)–(26)], thereby making the groundwater alkaline and promoting As and F enrichment16,42,58. The dissolution of feldspar-group minerals also produces fine-grained materials, slowing groundwater movement and prolonging water-rock interaction.

$${{\text{Na}}_{\text{4}}}{\text{CaAl}}_{\text{6}}{{\text{Si}}_{\text{14}}}{{\text{O}}_{\text{40}}}{\text{ + 2}}{{\text{H}}_{\text{2}}}{\text{O}}{\text{ + 2C}}{{\text{O}}_{\text{2}}} \to {{\text{Na}}}{\text{Al}}_{\text{3}}{{\text{Si}}_{\text{3}}}{{\text{O}}_{\text{10}}}{\text{OH}}{\text{ + 3NaAlS}}{{\text{i}}_{\text{3}}}{{\text{O}}_{\text{8}}}{\text{ + 2Si}}{{\text{O}}_{\text{2}}}{{\text{+ Ca}}^{{\text{2+}}}}{\text{+ 2HC}}{{\text{O}}_{\text{3}}}^{{\text{-}}} $$
(25)
$${\text{2NaAlS}}{{\text{i}}_{\text{3}}}{{\text{O}}_{\text{8}}}{\text{ + 2C}}{{\text{O}}_{\text{2}}}{\text{ + 11}}{{\text{H}}_{\text{2}}}{\text{O }} \to {\text{ A}}{{\text{l}}_{\text{2}}}{\text{(S}}{{\text{i}}_{\text{2}}}{{\text{O}}_{\text{5}}}{\text{)(OH}}{{\text{)}}_{\text{4}}}{\text{ + 4}}{{\text{H}}_{\text{4}}}{\text{Si}}{{\text{O}}_{\text{4}}}{\text{ + 2N}}{{\text{a}}^{\text{+}}}{\text{ + 2 HC}}{{\text{O}}_{\text{3}}}^{{\text{-}}}$$
(26)

Evaporation

The relationship of Cl to δD and δ18O was analyzed to further distinguish hydrogeochemical processes20. An increase in δD and δ18O with rising Cl indicates that evaporation is dominant; no change with increasing Cl indicates that evaporite dissolution is dominant; and a slight increase at low Cl concentrations indicates the mixed effect of both processes20. As shown in Figs. S6a and b, As- and F-rich groundwater is influenced by both evaporation and evaporite dissolution. Recharge sources for groundwater in the study area include precipitation infiltration and agricultural irrigation (canal seepage, infiltration)2. Long-term irrigation has raised groundwater levels over an area of approximately 350 km2, reducing depths to less than the critical evaporation depth (5 m)42. Furthermore, it is observed that groundwater samples with As > 0.01 mg/L and F > 1.5 mg/L are invariably associated with a high soluble salt content of approximately 2000 mg/L [Fig. 5(c)]. This spatial coincidence points to a possible geochemical link between the co-enrichment of As and F and high salinity conditions. Under such conditions, intense evaporation promotes salinity buildup and the enrichment of As and F in groundwater.

Anthropogenic activities

The relationship between TDS and (NO3+Cl)/HCO3 was used to determine the potential anthropogenic influences on groundwater quality. As shown in Fig. 5(d), the strong positive correlation suggests substantial anthropogenic influence on groundwater quality59, particularly on high-F concentrations. Furthermore, Figs. 5(a) and (b) show that high-As and high-F groundwater is primarily located in cultivated land zones.

To improve crop yields, large amounts of fertilizers and pesticides are applied in the irrigation district42. Common agricultural amendments, such as potash and NPK (nitrogen-phosphorus-potassium) fertilizers, contain substantial amounts of fluorides. Additionally, phosphate fertilizers, including superphosphate, contribute not only fluorides but also relatively high concentrations of arsenic. Pesticides including fluoroacetamide and sodium fluoroacetate are rich in F60,61. Some insecticides used in agriculture may also be a potential source of As in groundwater62. Long-term use of pesticides and chemical fertilizers results in the accumulation of As and F in soils. These contaminants then enter groundwater through leaching and runoff, leading to inceeased concentrations in the aquifer. Therefore, the high concentrations of arsenic and fluoride in groundwater are driven by both natural factors and anthropogenic activities.

Climate conditions

The Jiaokou Irrigation District experiences a warm‑temperate semi‑arid monsoon climate, with a mean annual evaporation (1220.9 mm) significantly exceeding precipitation (527.2 mm) by approximately twofold38,42. Groundwater samples exceeding arsenic and fluoride thresholds were predominantly located in the central part of the study area where the water‑table depth was shallow (< 5 m, i.e., below the limiting evaporation depth). Intense evaporative concentration in these zones promoted the enrichment of arsenic and fluoride in groundwater. Furthermore, the arid climate necessitated a high dependence on irrigation for agriculture2. Large‑scale groundwater extraction and return flow created an enhanced pumping‑irrigation‑recharge cycle. This process not only reintroduced evaporatively concentrated water with elevated arsenic and fluoride into the aquifer but may also introduce exogenous arsenic and fluoride derived from fertilizers and pesticides, thereby further exacerbating arsenic and fluoride contamination in groundwater.

Conceptual model of As and F enrichment mechanisms

The synthesis of the aforementioned results led to the development of a conceptual model depicting the enrichment mechanisms of arsenic and fluoride in the study area’s groundwater (Fig. 7). Groundwater in the study area is overall weakly alkaline, with alkalinity further enhanced by HCO3 generated from the dissolution of silicate rocks (e.g., plagioclase)42,58. These conditions favor the enrichment of As and F. The alkaline environment promotes the accumulation of As by reducing its adsorption on mineral surfaces. In particular, the As adsorbed on the surfaces of hematite and goethite undergoes desorption under alkaline conditions16. Furthermore, fluorite (CaF2) precipitation is inhibited, and the F in fluorine-bearing minerals (e.g., fluorapatite, biotite, amphibole) is easily replaced by OH, leading to the enrichment of F54. Thus, the interaction between groundwater and arsenic- or fluoride-bearing minerals is a key mechanism for both As and F enrichment. Additionally, long-term fertilizer and pesticide application have led to As and F accumulation in soils, which enter groundwater through leaching and infiltration. Collectively, these processes form a dual “natural-anthropogenic” enrichment pattern for As and F in the region.

Fig. 7
Fig. 7
Full size image

Conceptual model of the enrichment mechanisms of As and F in groundwater.

Health risk assessment

Human health risk assessment

Non-carcinogenic risk

The box plots illustrating health risks posed by arsenic and fluoride contamination in groundwater of the Jiaokou Irrigation District to adults and children are shown in Fig. 8. For adults, the non-carcinogenic risk values of arsenic ranged from 0.08 to 1.38 (average: 0.38), while for children, the values ranged from 0.16 to 2.65 (average: 0.72) (Table S6). Although average non-carcinogenic risk values for both adults and children were below the safety threshold, their maximum values exceeded this threshold in 7.84% and 17.65% of samples, respectively, indicating that arsenic pollution in groundwater can potentially pose hazards to human health. The average non-carcinogenic risk via oral ingestion is approximately 400 times higher than that via dermal contact, indicating that oral ingestion is the dominant exposure pathway. Furthermore, a significantly higher non-carcinogenic risk from arsenic was observed in children compared to adults.

Fig. 8
Fig. 8
Full size image

Box plots showing (a) non-carcinogenic health risks from As and F exposure for adults andchildren; (b)carcinogenic risk of As for adults and children.

The results of non-carcinogenic risk assessment for F in groundwater are shown in Table S6. For adults, the non-carcinogenic risk values ranged from 0.23 to 2.82 (average: 0.96), while for children, these values were higher, ranging from 0.44 to 5.41 (average: 1.85). The proportions of samples exceeding the safety threshold were 41.18% for adults and 80.39% for children, indicating that fluoride pollution poses a more widespread and severe health threat to children. This is consistent with the findings of Zhang et al.41. Similar to arsenic, oral ingestion was the predominant pathway. Comparatively, the non-carcinogenic risk of fluoride for both adults and children was substantially higher than that of arsenic.

For combined As-F exposure, the average non-carcinogenic risk for adults was 1.34 (maximum: 3.96), with 64.71% of water samples exceeding the safety threshold (HQ = 1). For children, the average non-carcinogenic risk was 2.57 (maximum: 7.60), with 96.08% of the water samples above the safety threshold. These findings indicate that more than 60% of groundwater in the study area poses potential health threats to both adults and children, with children facing markedly higher risks from As-F co-contamination. In summary, As and F co-contamination constitutes a significant non-carcinogenic risk to both adults and children.

Carcinogenic risk of arsenic

Carcinogenic risk assessment for As in the groundwater of the study area is presented in Table S7. In contrast to non-carcinogenic risk patterns, adults exhibited approximately threefold higher carcinogenic risk for arsenic than children, with oral ingestion as the dominant pathway. For children, carcinogenic risks via both oral ingestion and dermal contact were below the acceptable threshold (1 × 10− 4). For adults, the carcinogenic risk values from dermal contact (3.55 × 10− 8 to 6.04 × 10− 7) were below the threshold, but the carcinogenic risk values from oral ingestion ranged from 1.44 × 10− 5 to 2.45 × 10− 4, with 17.65% of samples exceeding the acceptable level. This indicates that drinking arsenic-contaminated groundwater may pose a carcinogenic risk to adults.

Spatial distribution of human health risks

Spatial patterns of health risks were mapped using Surfer (Fig. S7). For adults, non-carcinogenic risks from arsenic were primarily concentrated in the northern and southeastern parts of Linwei District and the central part of Pucheng County (Fig. S7a). These zones occupied an area of 14.47 km2, accounting for 1.41% of the study area. In addition to the aforementioned areas, carcinogenic risks extended to northwest Linwei District and portions of Yanliang District (Fig. S7e). These zones occupied a total area of 138.89 km2, accounting for 13.56% of the study area. For children, except Lintong District and Dali County, other counties and districts had varying degrees of non-carcinogenic risks, covering an area of 187.41 km2 (18.30% of the study area), with high-risk zones mainly distributed in Linwei District and Pucheng County (Fig. S7b). No carcinogenic risk for arsenic to children was detected in the study area (Fig. S7f).

Fluoride-related non-carcinogenic risks to adults occurred in all counties and districts except Lintong District (affected area: 446.87 km², accounting for 43.62% of the study area) (Fig. S7c), while for children, almost the entire study area had non-carcinogenic risks (affected area: 970.07 km², accounting for 94.70% of the study area) (Fig. S7d). The affected area of As-F non-carcinogenic risk is substantially larger for children than for adults, with high-risk areas mainly concentrated in Linwei District and Pucheng County.

Uncertainty assessment based on monte carlo simulation

Monte Carlo simulations (10,000 iterations; random seed = 2026) yielded health risk estimates at the 5% and 95% confidence levels for arsenic and fluoride pollution in groundwater of the study area (Fig. 9). The results show that both non-carcinogenic and carcinogenic risks posed by arsenic and fluoride pollution in groundwater to adults and children follow a log-normal distribution. For arsenic, probabilities of non-carcinogenic risk exceeding the safety threshold were 2.56% (adults) and 13.99% (children), with mean values of 0.3119 and 0.5976, respectively [Figs. 9(a) and (b)]. For carcinogenic risks, the probabilities of exceeding the safety threshold were 11.96% (adults) and 0.00% (children), with average values of 5.5410 × 10− 5 and 2.1231 × 10− 5, respectively [Figs. 9(e) and (f)].

Fig. 9
Fig. 9
Full size image

Simulation plot of health risk uncertainty for adults and children.

For fluoride, probabilities of non-carcinogenic risk exceeding the safety threshold were 24.21% (adults) and 68.90% (children), with mean values of 0.7874 and 1.5099, respectively [Figs. 9(c) and (d)]. These results confirm that fluoride poses a higher non-carcinogenic risk to humans than arsenic. Both adults and children in the study area face health risks from drinking groundwater contaminated with arsenic and fluoride, with children experiencing higher non-carcinogenic risks.

The Monte Carlo predictions are consistent with the aforementioned health risk assessment results. In addition, compared with traditional methods, uncertainty assessment using Monte Carlo simulation can present predicted results under different probability distributions, identify and reduce uncertainties in risk assessment, and provide a scientific basis for decision-making aimed at reducing health risks.

Sensitivity analysis

Sensitivity reflects the relationship between variance and risk, where the sign (positive/negative) indicates the direction of correlation. A greater absolute sensitivity value denotes a stronger influence on the risk assessment63. The results of the sensitivity analysis for different population groups are presented in Fig. 10. Except for BW, all other parameters showed positive correlations with health risks in both adults and children. The total non-carcinogenic risks for adults and children were primarily influenced by F in groundwater, with sensitivity values of 62.2% and 59.6% (corresponding to correlation coefficients of 0.75 and 0.73), respectively. As, EF, IR, and BW had relatively smaller impacts. However, EF and IR were more sensitive to non-carcinogenic risk in children [Figs. 10(a) and (b)]. As shown in Figs. 10(c) and (d), arsenic in groundwater exhibited higher sensitivity for carcinogenic risk in adults. Compared with adults, children’s IR showed greater sensitivity to carcinogenic risk.

Fig. 10
Fig. 10
Full size image

Sensitivity analysis of (a) HQTotal for adults; (b) HQTotal for children; (c) CRAs for adults; (d) CRAs for children.

Removal methods of arsenic and fluorine

To address arsenic and fluoride contamination in irrigation district groundwater, reliable and low‑cost treatment technologies suitable for local conditions, such as coagulation/filtration and bio‑composite adsorption techniques, can be adopted. Coagulation/filtration is a widely used water treatment technique that has been implemented in multiple countries for the removal of arsenic and fluoride from water53,64. Regarding bio-composite adsorption, biosorbents derived from orange and apple peels have demonstrated promising effectiveness in adsorbing both arsenic and fluoride65. Compared to conventional adsorbents such as activated carbon and alumina, these biosorbents offer advantages not only in technical performance but also in economic feasibility66. Promoting such economically viable technologies is of significant importance for ensuring residents’ drinking water safety. Moreover, drinking water safety interventions should prioritize children in high-risk areas.

Conclusions

This study investigated As and F contamination in the groundwater of the Jiaokou Irrigation District, Shaanxi Province, China. The spatial distribution and enrichment mechanisms of these contaminants were systematically analyzed through Self-organizing map (SOM), K-means clustering, and isotope tracing (δD and δ18O). A health risk assessment model incorporating Monte Carlo simulation was developed to quantify the associated risks for adults and children. The key findings are summarized below.

The groundwater was characterized as weakly alkaline brackish water. SOM cluster analysis classified it into three hydrochemical types, including Cl·SO4-Na, HCO3-Na, and HCO3-Ca. Co-contamination of As and F was prevalent, with exceedance rates for As, F, and both contaminants reaching 11.80%, 43.75%, and 9.80% of the samples, respectively. Spatially, elevated concentrations were primarily concentrated in Linwei District and Pucheng County, covering 3.61% and 54.43% of the total area, respectively. Integrated analysis confirmed the correlation between As and F, with both also linking to pH and CO32−.

Groundwater was primarily recharged by ratmospheric precipitation and irrigation infiltration. During infiltration, water–rock interactions in a weakly alkaline environment facilitate As and F enrichment. This environment reduced As adsorption onto minerals, inhibited fluorite precipitation, and promoted the replacement of F with OH in fluoride-bearing minerals. The dissolution of silicate minerals (e.g., plagioclase, albite) further enhanced groundwater alkalinity and prolonged water-rock contact. Evaporation and the intensive use of fertilizers and pesticides also contributed to the enrichment of arsenic and fluoride.

The health risk assessment revealed that children in the study area were susceptible to non-carcinogenic risks from both As and F, whereas adults faced a higher carcinogenic risk from As. Oral ingestion was the primary exposure pathway for both risk types. F presented a substantially higher non-carcinogenic risk than As. Spatially, high-risk areas were primarily clustered in Linwei District and Pucheng County, which was highly consistent with the pollution distribution. Furthermore, sensitivity analysis identified F concentration as the dominant factor for non-carcinogenic risk, and As concentration as the key variable for its carcinogenic risk.

This work provides a comprehensive understanding of the distribution and enrichment mechanisms of arsenic–fluoride co-contamination in the groundwater of the Jiaokou Irrigation District, thereby informing future strategies for groundwater protection and public health risk mitigation in arid-semiarid irrigation regions. However, this study has limitations in the investigation of other heavy metals/trace metals within the research area. Future studies should incorporate XRD analysis to clarify the role of these elements in the enrichment processes of arsenic and fluoride.