Background & Summary

Groundwater is critical for meeting human and ecosystem water needs, being the largest available freshwater resource1,2. It provides drinking water to billions of people3, supplies nearly half of global irrigation demand4,5, and sustains ecosystems, including rivers, lakes, wetlands, and other terrestrial habitats6,7. The importance of groundwater is expected to increase due to rising water demand and climate change impacts8,9,10. This is particularly true for the Southern Hemisphere given the evolving climate impact on surface water sources and the ongoing expansion of the agricultural footprint11,12,13. For instance, Brazil faces declines in groundwater levels and streamflow leakage due to the highly uneven distribution of water resources and intensified water uses14,15. However, limitations in the spatiotemporal coverage of groundwater monitoring networks continue to constrain our understanding of the spatiotemporal dynamics of groundwater systems16,17, which is essential for effective water resource management18,19.

The potential applications of a groundwater dataset are wide-ranging, including studies on groundwater recharge20,21, seawater intrusion22,23 land subsidence24,25, groundwater dependent ecosystems26,27, vegetation drought resilience28, streamflow depletion15,29 and water resource planning30,31. Hence, a user-friendly and reliable dataset may serve as a valuable resource for researchers studying groundwater management, improving understanding and modeling32,33. Besides, it enables researchers to integrate groundwater data into their studies34,35, driving advancements in the understanding of surface water and groundwater resources interactions36,37. Furthermore, such a database may help policymakers develop evidence-based policies and optimize resource allocation and conservation38,39.

Here, we present the Groundwater Well Database for Brazil (GWDBrazil) compiled and standardized from Geological Survey of Brazil projects. This initiative seeks to contribute to a more holistic view of groundwater management and access, in line with previous open groundwater datasets40,41,42. This paper addresses the current limitation of groundwater dataset in Brazil, which typically require manual well-by-well handling and lack a harmonized format43,44. These limitations can be time-consuming and challenging to work with, particularly for researchers outside of the groundwater field, for policymakers, and for local communities. Our harmonized dataset, validated by the Geological Survey of Brazil, underwent quality assurance and quality control procedures to ensure accuracy, adhering to the principles of transparency and data integrity. The dataset is available in both tabular (csv) and geospatial (shapefile) formats for 351,256 wells, including about 450 continuously monitored wells (2010–2024, also in netCDF format). Ultimately, we hope that this accessible database fosters collaboration across the fields of groundwater, surface water hydrology, and water policy and management33,45.

Methods

The development of our groundwater well database includes five initial steps: data collection, data standardization, quality control, data validation, and data export. These steps are summarized in Fig. 1 and explained in detail in this section. After finalizing the initial groundwater well database, we also added additional variables from remote sensing products and other databases that may be useful for researchers and managers, particularly for comprehensive studies of surface water and groundwater resources.

Fig. 1
figure 1

Overview of the process to develop the groundwater well data. Data were collected from two projects of Geological Survey of Brazil: Groundwater Information System (SIAGAS)43 and Integrated Groundwater Monitoring Network Project (RIMAS)44. The data were harmonized (step 2) and then quality control was applied (step 3) to ensure that the records are accurate groundwater wells. When possible, groundwater well data were cross-verified and marked with a quality flag (step 4). In the end, the data were made available in tabular (csv) and geospatial formats (shapefile, and netCDF for monitoring wells). All of the steps were reviewed by members of the Geological Survey of Brazil. The inset map (top left) highlights Brazil within South America in green.

Data sources

The groundwater well database presented here was built in collaboration with the Geological Survey of Brazil (SGB – in Portuguese: Serviço Geológico do Brasil) through the Groundwater Information System (SIAGAS – in Portuguese: Sistema de Informações de Águas Subterrâneas)43 and the Integrated Groundwater Monitoring Network Project (RIMAS – in Portuguese: Rede Integrada de Monitoramento das Águas Subterrâneas)44. The Geological Survey of Brazil granted us explicit permission to publish their groundwater well records and had multiple opportunities to review this database.

SIAGAS is a repository primarily consisting of water wells, with data typically provided by several different drilling companies or state agencies, which may result in some inconsistencies and a non-harmonized database43. The Geological Survey of Brazil is responsible for storing the data, but the accuracy of the information remains the responsibility of the data providers. This database contains five data categories: (i) general data, such as well location and purpose; (ii) drilling data, such as well depth; (iii) geological data, such as the pumped aquifer; (iv) pumping test data; and (v) chemical analysis data.

RIMAS is a continuous groundwater level monitoring project focused on Brazil’s main sedimentary aquifers44. The Geological Survey of Brazil is responsible for both data generation and quality control. Although groundwater quality monitoring is not the primary focus of this project, some wells may include hydrogeochemical analyses, upon request by local authorities. In addition to the information already present in SIAGAS, RIMAS includes two additional data categories: (vi) groundwater level monitoring; and (vii) hydrogeochemical monitoring.

Both databases are updated daily and have been used in several scientific studies14,15,46. However, since these databases require manual well-by-well handling and do not present a fully harmonized dataset, their use can be challenging for users outside of the scientific field. Even for researchers, a complete, updated, quality controlled, and harmonized dataset, in a centrally located archive, can save precious time, avoid duplicating the efforts of other researchers in cleaning the data.

Data collection and standardization

The SIAGAS data collection was completed in March 2024, containing 371,438 records at that time (see Fig. 1). The primary SIAGAS raw data was divided into six raw spreadsheets (in csv format), which can be found in the Zenodo repository47. We directly communicated with the members of the Geological Survey of Brazil, who provided insights into the data collection methods, definitions, and notations used. With their support, we aggregated the data into a single spreadsheet using a unique identifier for each well. Table 1 presents the variables extracted from SIAGAS. To enhance the accessibility of the database by the international community, we adopted the Latin-ASCII format, and all dates were transformed into the American format (month/day/year). Empty variables, as well as inconsistent values and dates, were assigned the value “NA”. Additional information related to the original SIAGAS header (in Portuguese), and the specific raw spreadsheet each variable is derived from can be found in the supplementary material (see Table S1).

Table 1 Overview of the variables extracted from the Groundwater Information System (SIAGAS) project.

In the general data category, we selected variables such as well location, type (e.g., water well, monitoring well), status (e.g., active or abandoned), and primary water use (e.g., domestic, agriculture or industry). In collaboration with the Geological Survey of Brazil, we standardized this information by merging synonyms and translating terms into those commonly used in international literature. Additionally, we ensured that the well locations are presented in the SIRGAS 2000 datum using decimal degrees. Further details on this standardization process can be found in the supplementary material (see Table S2). Moreover, in the drilling data category, we selected two variables: the drilling date and well depth. Information such as the well screened interval was not included in this database, as it was not explicitly available in the downloaded data. However, this information may be available for some wells on the SIAGAS website43.

In the pumping data category, we collected key information such as the date of the pumping test, the static water level, and the well capacity (see Table 1). Many records lack pumping test data, possibly due to state-owned companies not providing this information, or because these records were added through field campaigns where general well information was identified, but detailed pumping test data could not be collected. Similarly, most records do not contain data about the aquifer. For most of the records with aquifer data (~96%), only a single overlying aquifer is indicated. In cases where multiple overlying aquifers are present, we present them as “multiple aquifers” in variable ‘Aquifer_broad’ in Table 1. For records, we classified the aquifer as confined if at least one of the overlying aquifers is confined. If none of the overlying aquifers is confined but at least one is semi-confined, the record was classified as semi-confined. Otherwise, we classified the aquifer as unconfined. Information about wells with multiple overlying aquifers can be found in the supplementary material (see Table S3).

Due to the challenges in consolidating and validating information on lithology and water quality, we only indicate whether a record contains such data. A record was marked as containing water quality or lithological data if it includes consistent information. For example, a record is said to contain water quality information if it clearly specifies the sampling date and includes at least one physical or chemical parameter (e.g., pH, electric conductivity, water color, temperature, water turbidity, total suspended solids, settleable solids). Recent local efforts have been made to standardize lithological data in SIAGAS48, however, this process requires local aquifer knowledge and validation of lithology data is beyond the scope of this study. We recommend that users who require detailed water quality or lithological data access the raw spreadsheet in the Zenodo repository47 or refer to SIAGAS website43.

The RIMAS data collection was completed in July 2024, including 481 records at that time (see Fig. 1). We identified about nine potentially erroneous or corrupt records that, despite being included in the RIMAS record list, contained no data in any category. This may be due to differences in the data download period or the possibility that some information was unavailable at the time. We verified the compatibility of information between the RIMAS and SIAGAS projects. Where there were discrepancies, we prioritized the RIMAS data, as it is maintained under the responsibility of the Geological Survey of Brazil.

In addition to the variables already included in Table 1, we provide the initial and final year of the data timeseries and the amount of daily information each series contains in RIMAS (see Table 2). We identified 19 records that lacked a groundwater data timeseries among the 472 RIMAS records; however, 10 of these 19 records included hydrogeochemical data. Most of these records without a groundwater data timeseries correspond to wells drilled in 2023, which may explain why their groundwater data timeseries are not yet available. Given the variability in hydrogeochemical parameters analyzed for each well, we only indicate whether hydrogeochemical data is present in each record. The original data downloaded from RIMAS can be found in the Zenodo repository47.

Table 2 Overview of the variables extracted from the Integrated Groundwater Monitoring Network Project (RIMAS).

All steps were validated by members of the Geological Survey of Brazil. Additionally, we visually checked a random sample of 100 records to ensure that the information in Tables 1, 2 is consistent with the data in the downloaded raw spreadsheets and on the SIAGAS and RIMAS websites.

Quality control

This step ensures that the records in the database refer to water or monitoring wells in Brazil, not inferred from surface water, such as springs or wetlands. As a comprehensive definition, we consider a groundwater well a man-made excavation used for extracting or monitoring water in an aquifer41. To achieve this, we applied four steps commonly used to create well databases29,31: (i) removing records that do not correspond to well construction, (ii) removing duplicate records, (iii) removing records outside the Brazilian territory, and (iv) removing records with unclear construction dates (see Fig. 2). Following the principles of transparency and data integrity, all records removed during this process are included in the supplementary material.

Fig. 2
figure 2

Overview of the quality control step. The data was structured to exclude any records that did not correspond to water or monitoring wells (Step 1). Then, all wells located within 10 meters of each other were checked to eliminate potential duplicates (Step 2). Next, all wells outside Brazilian territory were removed (Step 3), and wells with missing drilling dates or the date they were entered into the database were also excluded. After these steps, 351,256 records were identified as likely unique wells, including approximately 450 monitoring wells.

To remove records that do not correspond to well construction (i), we applied a pre-filter during the previous standardization step in the ‘Types_of_wells’ variable (see Table 1). These records were typically springs and lakes (see Table S2), resulting in the removal of 2,696 records. Additionally, 19,388 records had no information (NA values) in this variable, among these, 6,959 records lacked information on the drilling date, pumping test date, or aquifers data, which could confirm that they were wells. Therefore, we excluded these records from our database, resulting in the removal of 9,655 records (2.60% of the total, see Figure S1a). The records removed in this step can be found in the supplementary material (Table S4).

We then checked the remaining 361,783 records for duplicates (ii). If two records were less than 10 meters apart, we compared their information. This threshold accounts for locations errors (e.g., GPS errors), which are typically within a few meters. We identified 14,837 pairs of records less than 10 meters apart. If the records had the same information in the drilling and pumping data categories, we considered them duplicates. We found 129 pairs of duplicate records, none of which were duplicated more than twice. For 9,625 pairs of records, we could not fully compare drilling and pumping data due to missing values. Among these, 8,507 pairs were in the same location and were treated as duplicates as a precaution. This resulted in 2,933 groups of duplicates, meaning a record could be duplicated multiple times. This process led to exclusion of 4,522 records. For the remaining 1,118 pairs of records less than 10 meters apart, but not in the same location, we visually identified 64 pairs as duplicates based on other matching information, such as well depth. The remaining 1,054 pairs were not excluded because there was insufficient evidence to classify them as duplicates. In some cases, wells may have been placed in the same location due to the lack of detailed positional data (e.g., wells located at the center of an industrial park). In the end, preserving the record that was first added into the SIAGAS database, we removed 4,711 duplicate records (1.27% of the total, see Figure S1b). The records removed in this step can be found in the supplementary material (Table S5).

Next, we checked if the remaining 357,072 records were outside Brazilian territory (iii). We found only two records in unrealistic locations (see Figure S1c), likely due to location errors. We then checked whether the remaining 357,070 records included any information that could indicate when they were drilled (iv). A total of 276,145 records contained either the drilling date or the pumping test date. However, the SIAGAS project also consists of wells that were entered into the system through field work. Considering this, we found 351,256 records with information about when they were entered into the database, indicating that these wells were drilled before that date. Since information from these wells can be relevant to policymakers and some scientific applications, we excluded only 5,814 records (1.56% of the total, see Figure S1d) that lacked any date information (see Table S6).

In the end, the quality control steps indicated that 351,256 records (94.57% of the total) are likely unique wells, including 453 monitoring wells. Recent estimates indicate the presence of over 2.5 million tubular wells in Brazil, with most (~88%) operating without official authorization49,50 (i.e., without a license or registration for pumping). Among these millions of wells, we believe that the Geological Survey of Brazil projects provide enough information to identify 351,256 wells. This suggests that a substantial contribution of groundwater to water supply in Brazil is unaccounted for in official statistics. In fact, among the wells in this database with a stated purpose (~58%), nearly half are dedicated to domestic use or public water supplies (Fig. 3b), even though agricultural wells likely contribute to high-volume groundwater extraction (Fig. 3c). This unaccounted for water supply may lead to misinterpretations about the country’s groundwater usage and dependence, despite reports of localized aquifer overexploitation, induced contamination, and reduced baseflow (groundwater discharge) to local rivers51,52,53.

Fig. 3
figure 3

Overview of the water purpose of the groundwater wells in the database. (a) Each point on the map represents the recorded location of a well. Blue dots represent wells constructed for domestic or public water supply (DOM), green dots for agriculture/ livestock (AG) and red dots for industry (IND). Mixed colors represent wells with more than one use (e.g., DOM & AG; DOM, AG & IND). Wells without an indication of purpose are not shown. The fraction of wells per purpose in Brazil is shown, with the majority primarily dedicated to domestic or public water supply. (b) Well capacity by purpose, indicates that wells for agriculture have the highest average capacity. (c) Box plots indicate 25th and 75th percentiles, bars represent 10th and 90th percentiles, solid horizonal lines indicate median, and x marks mean values. The dark black lines delineate the geographic regions of Brazil, the lighter black lines represent the Brazilian states and the federal district.

Data validation

Data validation of the 351,256 wells included cross-verification checks for 14 of the variables in Table 1: ‘Latitude’; ‘Longitude’; ‘ID_city_IBGE’; ‘City’; ‘State’; ‘Surface_elevation’; ‘Year_reported’; ‘Last_updated_reported’; ‘Drilling_reported’; ‘Well_depth’; ‘Pumping_test_date’; ‘Static_water_level’; ‘Dynamic_water_level’ and ‘Well_capacity’. We used a flagging system to indicate data quality: -1 for inconsistent data, and 0 for raw data (i.e., data where no inconsistencies were identified, or data quality could not be assessed). This system safeguards the integrity of the database, and allows users to apply the appropriate level of rigor depending on their application.

We verified the well’s city and state information by cross-referencing with data from the Brazilian Institute of Geography and Statistics54, using the well’s latitude and longitude. We found discrepancies in the city information for 905 wells (see Figure S2a) and flagged them as inconsistent. Figure 4a shows that cities with the highest number of wells per unit area are concentrated in Northeast Brazil, particularly in metropolitan regions (see Figure S3), consistent with previous local studies55,56. This distribution might be explained by the region’s vulnerability to recurring drought events57. Similarly, 20 wells showed discrepancies in their state data (see Figure S2b) and were also flagged as inconsistent. Most of these wells with inconsistent city and state data are located near the borders of two cities or states, which may explain their inconsistencies. Outside the Northeast, the highest concentrations of wells by state are found in São Paulo (see Figure S4), a state with a long history of groundwater use49,52.

Fig. 4
figure 4

Heat map showing the well density across Brazilian cities (a), number of wells constructed between 1970 and 2024 (b), well depth distribution (c) and its percentile graph (d), and well water level distribution across Brazil (e) and its percentile graph (f). In (a) and (b), wells with inconsistent city data and drilling date inconsistencies were not included, respectively. In (c) and (e), wells with inconsistent well depths and water levels are not shown, respectively. Only 99% of the data is shown in (d) and (f) for better visualization.

Next, we verified the latitude and longitude variables by using a 100-meter radius buffer around each well and calculating the percentage of land use and land cover based on the MapBiomas Collection 8 project for 202258,59. We selected a 100-meter radius to provide a representative value based on the 30 × 30-meter resolution of the land use and land cover dataset. We flagged the location of well as inconsistent if more than 50% of the area was classified as ‘water’ or ‘not observed’. A total of 1,661 wells were classified as having inconsistent locations, with most situated in water bodies or very close to the Brazilian coast (see Figure S5). Some regions in Brazil, such as the Pantanal ecoregion, can be flooded for months, and wells near rivers may be submerged for extended periods60. However, as a conservative approach, we still classified these wells’ location (latitude and longitude) as inconsistent.

Given the importance of surface elevation for various scientific applications, we compared the database values with the ANADEM product, a digital terrain model for South America based on the Copernicus DEM GLO-30, which includes vegetation bias correction and has a resolution of 1 arc-second61 (~30 m). If the database value fell outside the ANADEM tolerance range, we flagged them as inconsistent. As a conservative measure, we set the tolerance range as the model bias (1.50 m) plus two times the root mean squared deviation61 (6.99 m). We found 26,379 wells that fell outside the established tolerance range (see Figure S6a). We selected ANADEM due to its high resolution and greater accuracy in Brazil compared to other digital elevation models61. We also performed a relaxed comparison using the Multi-Error-Removed Improved-Terrain (MERIT) model62, which has a resolution of 3 arc-seconds (~90 m), and obtained similar results (see Figure S6b). For wells with surface elevation values within the ANADEM tolerance range, we obtained a median of 475 meters above sea level (see Figure S7).

In addition, we checked the date variables consistency by comparing them with the date the files were downloaded. The year in which the well was registered into SIAGAS system was considered inconsistent if it was later than the downloaded date. We did not find any wells with this inconsistency. Similarly, the last year that the well was updated into SIAGAS system was considered inconsistent if it was earlier than the ‘year_reported’ or later than the downloaded date, but again, we did not find inconsistencies. The well drilling date was considered inconsistent if it was later than either the ‘year_reported’ or the downloaded date. We found 1,305 wells with this inconsistency (see Figure S8a). The pumping test date was considered inconsistent if it was later than the ‘year_reported’ or the downloaded date, or earlier than the well drilled date. We found 10,077 wells with this inconsistency (see Figure S8b), most of which had a pumping test date only a few days earlier than the well drilled date. This may be due to some data consistency issues, but as a conservative approach, we still classified them as inconsistent. Figure 4b shows the drilling dates of wells with consistent and non-empty dates from 1970 to 2024, indicating a strong increase in the number of wells in Brazil around the year 2000. The Geological Survey of Brazil began its activities in the 1970s, so data from periods before this date might not be accurately documented. However, since we lacked the means to verify the consistency of these earlier data, they were classified as raw data.

Moreover, we checked the well depth variable and identified 288 wells with potentially unrealistic depths (see Figure S9a), such as 9,999 meters, which were flagged as inconsistent. Figure 4c shows the well depth data across Brazil, excluding inconsistent values, and indicates that more than 50% of the wells have depths of less than 80 meters (see Fig. 4d). Similarly, we checked the well water level variable and found 3,433 wells with inconsistent well water level (see Figure S9b), such as values exceeding the well depth. Figure 3e shows the well water level data across Brazil, excluding inconsistent values, and indicates that more than 50% of the wells have a water level shallower than 12 meters (see Fig. 4f), in line with previous studies15,46.

Likewise, we checked the dynamic water level variable and found 121 wells (see Figure S9c) with inconsistent values, including dynamic water level during the pumping test shallower than the static water level or deeper than the well depth. The median dynamic water level, excluding inconsistent values, is around 40 meters (see Figure S10a). Moreover, we checked the well capacity variable and, we flagged 520 wells with zero values or unrealistically high capacities exceeding 1,000 m3/h — substantially higher than values typically reported for major Brazilian aquifers63,64. Furthermore, we identified 3,681 wells with well capacity data but with missing or inconsistent static or dynamic water level. We marked all these 4,182 wells as having inconsistent well capacity, noting that some wells had more than one inconsistency (see Figure S9d). The median well capacity, excluding inconsistent values, is 6 m3/h, a value commonly used by the Brazilian National Water and Sanitation Agency to calculate groundwater availability65.

To summarize, a total of 309,564 wells had no inconsistencies, which account for over 88% of the total dataset (see Table S7). Additionally, we could not cross-reference the qualitative variables, as the database is compiled from information provided by various well drillers, government agencies, and field records. However, regarding the type of well, the majority (~96%) can be classified as drilled wells (see Figure S11). For the status of the wells, most (~70%) are classified as active (see Figure S12). In terms of water use, as indicated in Fig. 3, almost half of the wells with a specified water use (~46%) are dedicated to domestic or public water supply. Concerning aquifer data, about one-third of the wells have some information about the aquifer (~31%; see Figure S13a). Among the wells with information on the aquifer confinement status, the majority is located in unconfined aquifers (~71%; see Figure S13b). Regarding lithological data, about one-third of the wells (~36%) include some lithological information (see Figure S14a). Similarly, about one-third of the wells (~34%) contain information on water quality (see Figure S14b).

Among the RIMAS continuous monitoring wells, 453 wells have at least one recorded daily observation, with a median monitoring duration of 3,074 days (Fig. 5a, Table S8). While numerous methods exist for detecting errors and outliers, most require local hydrogeological expertise and manual inspection. Consequently, large-scale studies often bypass this step or rely on subjective assessments, which may introduce biases in trend analyses and groundwater model calibration. To minimize this bias, we employed the HydroSight toolbox66, as it does not require adjustments for monitoring frequency, in conjunction with the best available hydrogeological information to systematically flag data.

Fig. 5
figure 5

Heat map showing groundwater level monitoring in RIMAS project across Brazil with flagged potential outliers or errors (a). Evolution of the number of monitoring wells installed by the RIMAS project (b). Brazilian region (S: South, N: North, CW: Central-West, NE: Northeast, SE: Southeast) with monitoring wells (c). The number of wells per region is indicated next to each region in subplot (a). Below the color legend, the total number of raw daily groundwater level observations, potential outlier observations, and observations with errors are indicated.

Each daily observation was assigned one of two flags: an error flag or an outlier flag. The error flag was set to -1 if the data met any of the following criteria: (i) duplicate values, (ii) absolute daily water level changes exceeding 10 meters, (iii) constant head values persisting for more than 30 days, or (iv) physically implausible readings (e.g., values exceeding well depth or negative values). Less than 0.5% of the dataset exhibited such errors (see Fig. 5). The outlier flag was set to -1 for data identified as outliers by the HydroSight toolbox double exponential smoothing time-series model. Less than 1% of the dataset was flagged as a potential outlier (see Fig. 5). The threshold for identifying outliers — expressed as the number of noise standard deviations (η) — ranged from 3 to 6, depending on hydrogeological conditions. Data without detected errors or outliers —or cases where data quality could not be assessed — were assigned a flag of 0.

While the HydroSight toolbox provides an automated and reproducible approach, it has limitations. It may misclassify outliers, with false negatives occurring during periods of low variance and false positives during periods of high variance. A detailed discussion of these limitations is available in the supplementary material (see Figure S15). We recommend that users exclude data flagged as errors and critically evaluate data flagged as outliers based on their specific applications.

Data export

In addition to the traditional variables available in SIAGAS and RIMAS, we incorporated 10 additional key variables from remote sensing products and other databases into this dataset (see Table 3). These variables may be particularly valuable for researchers and water resource managers, especially for integrated studies of surface water and groundwater interactions. When data from a specific product was unavailable at a well’s location, an ‘NA’ value was assigned.

Table 3 Overview of the variables added to this database for integrated studies of surface water and groundwater interactions, categorized into five main groups: Climate, Topography, Land Use and Land Cover (LULC), Aquifer, and Surface Waters.

In the climate category, we included long-term mean annual precipitation (Figure S16a), long-term mean annual potential evapotranspiration (Figure S16b), and the ratio of long-term mean annual precipitation to potential evapotranspiration, also known as the aridity index, from 1981 to 2024. These data were obtained from the Brazilian Daily Weather Gridded Data67, interpolated from ground observations at a 0.1° × 0.1° (~10 m) grid resolution. A concentration of wells is observed in Brazil’s semi-arid region (aridity index < 0.5; see Fig. 6a).

Fig. 6
figure 6

Probability density distribution of the long-term mean annual aridity index67 (a) and of ground elevation per well61 (b). Predominant land use and land cover (LULC) around each well58 (c). Probability density distribution of permeability68 (d) and of the distance between each well and the closest stream per well71 (e). Spatial distribution of the long-term mean annual aridity index (f), and of ground elevation (g), of main LULC types (h), of permeability (i), of streams (j), and wells included in this database across Brazil (k). Aridity index classes follow the classification from the Food and Agriculture Organization of the United Nations82. Elevation classes are based on the Brazilian Institute of Geography and Statistics83. The LULC classification is from the MapBiomas Collection 8 project58. Permeability is represented on a logarithmic scale for better visualization.

The topography category includes ground elevation data from the ANADEM product, which has a resolution of 1 arc-second61 (~30 m), and from the MERIT model, which has a resolution of 3 arc-seconds62 (~90 m) (see Figure S17). A concentration of wells is found near sea level (ground elevation < 100 m; see Fig. 6b).

In the land use and land cover (LULC) category, we identified the predominant land use and land cover within a 100-meter radius of each well using data from the MapBiomas Collection 8 project for 202258,59, which has a 30 × 30-meter resolution. Most wells are in areas classified as agricultural (see Fig. 6c), including pasture, croplands, forest plantations, and mixed-use mosaics (see Figure S5 for details on the major land use and land cover classes used in MapBiomas Collection 8).

For the aquifer category, we incorporated permeability and porosity data from the GLobalHYdrogeology MaPS (GLHYMPS)68,69, which provides global lithologies for approximately 1.8 million polygons with an average size of 100 km² (see Figure S18). The distribution of wells does not indicate specific concentration at a particular permeability (see Fig. 6d). Additionally, we included the aquifer outcrop zone overlaid by each well, based on data from the Brazilian National Water and Sanitation Agency70 (see Figure S19).

Finally, in the surface waters category, we introduced the distance from each well to the nearest second-order or greater stream using the HydroATLAS database71. This information is useful for various applications related to integrated surface water and groundwater studies72,73. A concentration of wells is observed within 1 km of the nearest river (see Fig. 6e).

Data Records

The Groundwater Well Database for Brazil (GWDBrazil) includes SIAGAS well data (available in CSV and Shapefile formats) and RIMAS well data (also available in NetCDF format). All products generated in this study are available in the Zenodo repository47.

  • SIAGAS_data.csv – SIAGAS dataset (see Table 1) without the flagging system.

  • SIAGAS_data_flagged – SIAGAS dataset (see Table 1) with the flagging system.

  • RIMAS_ID – RIMAS dataset with the flagging system (see ‘Section: Quality Control’).

  • Additional_data.csv – Includes 10 additional key variables from remote sensing products and other databases (see Table 3).

Note that in the SIAGAS well data, we provide as many variables as possible (see Table 1), even when some contain a high proportion of missing values (NAs). The percentage of NA values for each variable in the Zenodo repository version of the database, i.e., after excluding all inconsistent data identified during our validation process, is presented in the supplementary material (see Table S7). To further support users and facilitate data exploration, we also present the number of wells and the percentage of NA values per variable for each Brazilian state and the Federal District, and by the decade in which the well was entered into the SIAGAS system (see Tables S9S35). This allows users to quickly assess data availability for their region and intended study period. For a complete description of all available spreadsheets, refer to the supplementary material (see ‘S1. Description of Supplementary Files’ and ‘S2. Supplementary Tables’).

Technical Validation

During the construction of this well database, a series of steps were taken to ensure data harmonization (see ‘Section: Data collection and standardization’), followed by additional steps to verify that the records in our database are indeed wells (see ‘Section: Quality control’). For quantitative variables, cross-checks were performed, and a flag system was implemented to indicate potential inconsistencies (see ‘Section: Data validation’). Additionally, for daily monitoring data, a flagging system was introduced to identify potential errors and outliers (see ‘Section: Data validation’).

In this section, we qualitatively compare the classification of public water supply by city, as provided by the Brazilian National Water and Sanitation Agency56, with the wells classified in this database as domestic or public water supply wells (see Fig. 7). We found that cities classified as having exclusively surface water public supply have domestic wells. However, these cities exhibit the lowest median number of domestic wells per area compared to cities with mixed water sources or those that rely exclusively on groundwater (see Fig. 6d–f).

Fig. 7
figure 7

Domestic wells per area in cities that rely exclusively on surface water (a), both surface water and groundwater (b), and exclusively on groundwater (c), based on the classification from the Brazilian National Water and Sanitation Agency56. Boxplots of domestic wells per area unit for cities in each category (df) with boxes indicating 25th and 75th percentiles and bars representing 10th and 90th percentiles, solid horizontal lines denoting median, and x marks indicating mean value.

Next, we analyze trends in monitoring wells with long-term records free of flagged errors or outliers, comparing them with regional and local studies. As an example of this flagged monitoring data, we selected 77 wells from the 453 monitoring wells in Brazil that have over seven consecutive years of data with gaps of less than 15 consecutive days, excluding flagged data. We applied the nonparametric Theil–Sen slope estimator to assess long-term trends in these wells74,75. The statistical significance of trends was evaluated using the Mann-Kendall test at a 0.05 significance level76.

The results indicate that 49 wells exhibit a declining trend, 26 show a rising trend, and two display no significant trend (see Fig. 8a). Wells with long-term declining trends are primarily concentrated in specific regions of the country. As an example, we analyzed the Urucuia-Areado Aquifer, which includes six monitoring wells. Of these, five exhibit significant declining trends (see Fig. 8d), aligning with previous studies that indicate a drop in the water table within this aquifer14,77.

Fig. 8
figure 8

Trend analysis results for RIMAS monitoring wells with more than seven consecutive years of daily observation across Brazil and their distribution within Brazilian Köppen-Geiger climate classifications84 (Af: Tropical rainforest; Am: Tropical monsoon; Aw: Tropical savanna; BSh: Arid steppe, hot; Cwa: Temperate, dry winter, hot summer; Cwb: Temperate, dry winter, warm summer; Cfa: Temperate, no dry season, hot summer; Cfb: Temperate, no dry season, warm summer; (a). Location of the Urucuia-Areado Aquifer in Brazil70 (b). (c) Detailed view of six RIMAS monitoring wells (1: id2900020674; 2: id2900020672; 3: id2900021798; 4: id2900024872; 5: id2900024877; 6: id2900024879) within the Urucuia-Areado Aquifer (c). Groundwater levels are displayed as depth below ground level (d). Note that the groundwater level axes do not start at 0 m, and a break is introduced between 35 m and 75 m to enhance data visualization.

Uncertainty and future needs

This study is the first to publicly release a nationwide, standardized dataset describing both water and monitoring wells in Brazil. Because SIAGAS data primarily consist of water wells, inconsistencies in drilling reports may exist and might not have been fully identified during the quality control and validation steps. However, given the high cost of maintaining a long-term monitoring network78,79, many large-scale studies integrate water well data with monitoring well data to enhance spatial coverage15,29,46. Therefore, we believe that this integration is valuable for advancing sustainable water management in Brazil.

Additionally, many well records are not routinely updated, meaning that unreported modifications to wells may not be reflected in the data. Despite this, our database includes both the date each well was entered into SIAGAS and the most recent update date (see Table 1) to provide users with relevant temporal information.

The Geological Survey of Brazil continuously receives data on new and existing wells, meaning our database represents the most up-to-date information available at the time of collection. However, further investments are needed to expand and maintain groundwater monitoring efforts, and strategies should be implemented to regularize undocumented wells, thereby increasing the amount of available data in future updates52,79.

The results presented in Figs. 4, 6 highlight biases and gaps in Brazilian groundwater data, which could inform strategic placement of monitoring wells in priority areas to build a more representative and robust observational network for integrated surface and groundwater management35. Furthermore, this database serves as a foundation for integrating other well datasets, including those from universities, private companies, and Brazilian states, which already contribute data to the Geological Survey of Brazil. Such integration would facilitate the development of a comprehensive and easily accessible national well database, benefiting both the scientific community and decision-makers.