Background & Summary

PFOS and PFOA have attracted considerable attention due to their extensive use, environmental persistence, and potential adverse effects on human health and wildlife1,2. Their unique surfactant properties, derived from their fluorinated carbon chains3, led to widespread application in consumer and industrial products since the 1950s, including textiles, food packaging, and aqueous film-forming foams4,5. As a major producer of these compounds, China’s annual production of PFOS was about 100 tonnes, with approximately 30–40 tonnes/year used in metal plating and 25–35 tonnes/year in aqueous film-forming foams6.

The exceptional chemical stability and resistance to degradation of PFOS and PFOA have resulted in their widespread contamination across environmental matrices7. These compounds are known to bioaccumulate and biomagnify in food webs, reaching high concentrations in apex predators and humans8,9. The ecotoxicological evidence is substantial: field studies have associated PFOS exposure with adverse hepatic, immune, and reproductive effects in top predators10, while in aquatic ecosystems, PFAS have been shown to impair the survival, growth, and reproduction of invertebrates11, and to adversely impact the development, metabolism, and endocrine systems of vertebrates such as fish12. In parallel with these ecological risks, human exposure occurs primarily through ingestion of contaminated food and drinking water, although inhalation of dust and dermal contact also contribute to total exposure13,14, and has been epidemiologically linked to a range of adverse health outcomes, including immune dysfunction, metabolic disruption, and certain cancers15,16. The public health concern is exacerbated by their long biological half-lives in the human body (5.4 years for PFOS; 3.8 years for PFOA)17, underscoring their significance as a persistent threat to both environmental and public health.

Collectively, concerns regarding their persistence, bioaccumulation, and toxicity led to their classification as persistent organic pollutants (POPs) under the Stockholm Convention, with PFOS being listed for global restriction (Annex B) in 200918 and PFOA for global elimination (Annex A) in 201919. More recently, regulatory strategy has evolved towards a class-based approach; notably, the European Union has proposed a universal restriction on approximately 10,000 PFAS as a single class to manage their collective risk20,21. In alignment with these international trends22, China has included both compound groups in its “List of New Pollutants for Priority Control (2023 Edition),” which imposes strict prohibitions and restrictions on their production, use, and trade. Despite these escalating control measures, environmental and human exposure continues, driven by historical use, ongoing emissions from exempted applications, and the degradation of precursor compounds23. Additionally, the degradation of precursor compounds leads to the formation of PFOS and PFOA in the environment, serving as indirect sources of these pollutants24. While recent reviews, such as the work by Li et al.25. have advanced the understanding of this issue from a macroscopic perspective, there is a recognized need for up-to-date, high-resolution geospatial datasets on PFOS and PFOA occurrence. This gap, particularly for reusable point-level data suitable for modeling, hinders the capability to identify at-risk areas and to develop evidence-based management strategies.

To address this gap, this study compiles and presents an extensive dataset on the spatiotemporal distribution of PFOA and PFOS in China. Our dataset encompasses 2653 occurrence records from peer-reviewed literature published between 2021 and 2024, covering a wide range of environmental media, including soil, air, water bodies, and animal and human tissues. As a data descriptor, this work aims to provide a foundational dataset designed to complement existing review articles by providing high-resolution, point-level records suitable for more granular analysis and modeling. Our focus on PFOS and PFOA, identified by Li et al.25. as the most dominant PFAS pollutants in China, allows for rigorous extraction and validation for these key substances. By providing this detailed, multi-media, and georeferenced overview of PFOS and PFOA contamination, this dataset serves as a resource for identifying hotspots, informing targeted environmental management, and supporting future research and policy development related to PFAS pollution in China.

Methods

Data collection

Our literature screening procedure, as summarized in Fig. 1, involved two major scientific databases: Web of Science (WOS) (https://webofscience.com/) for English-language literature and China National Knowledge Infrastructure (CNKI) (https://kns.cnki.net/) for Chinese-language literature. The search covered the period from 2021 up to March 7, 2025, with no language restrictions. For WOS, we used the search terms (“perfluorooctane sulfonic acid” OR “PFOS” OR “perfluorooctane acid” “PFOA”) AND CHINA. For CNKI, we used (全氟辛烷磺酸 + PFOS + 全氟辛酸 + PFOA) in CNKI. Publications(a journal article or degree thesis) were considered eligible if any of the specified terms appeared in any section of the text.

Fig. 1
figure 1

Schematic overview of the literature search methodology and results.

A total of 3412 publications were initially retrieved for screening, including 2418 from CNKI and 994 from WOS. First, duplicates were excluded by title, resulting in 2279 publications. Next, review articles and publications that only described pollutant properties were excluded, selecting those that measured PFOS and PFOA concentrations. This process yielded 599 publications. Then, through full-text reading, publications without concentration records and those using data from other literature were excluded, ultimately resulting in 100 Chinese publications and 187 English publications that met the extraction criteria. Since the COVID-19 pandemic, the number of publications on PFOS and PFOA has continuously increased, indicating growing attention to priority controlled pollutants (Fig. 2).

Fig. 2
figure 2

Publication trend and data extraction summary for literature on PFOS and PFOA in China (2004–2025). The blue bars represent the total number of publications retrieved from Web of Science and CNKI databases prior to screening, illustrating the long-term research trend. The orange bars indicate the number of publications from 2021–2025 that met our inclusion criteria after screening. The red line shows the number of individual data records extracted from these screened publications for inclusion in our final dataset.

For each eligible publication, we extracted key information, including (i) pollutant name, (ii) sampling time and (iii) geographical location of the pollutant, (iv) detection data including minimum, maximum, arithmetic mean and standard deviation (if available), (v) detection methods, and (vi) reporting units. To avoid redundancy, if records with consistent sampling times were encountered, they were further compared with previously extracted data. Duplicate data were excluded if the sampling location, time, and target were all identical. Because many studies reported results for multiple environmental and biological matrices, each data point represents a unique measurement of PFOS or PFOA in a specific matrix at a specific time. Notably, several studies focused on known hotspot areas, resulting in concentrations exceeded typical background levels. Finally, 2653 records of PFOS and PFOA detection were extracted, with 2026 from WOS and 627 from CNKI.

Geo-positioning

Spatial coordinates and geographical reference data were systematically extracted for each documented occurrence of PFOS and PFOA from the corresponding scientific literature. Given that numerous publications provided solely administrative territorial designations as the spatial reference for compound detection, and sampling locations were often repeatedly reported in multiple publications, location records were consolidated to reduce redundancy and minimize geo-positioning errors26.

To assign latitude and longitude to each sampling site, we utilized a range of geospatial tools, including Google Maps, QGIS with OpenStreetMap data, and Baidu Maps26,27. When coordinates were directly provided in the literature, these were verified and standardized to the WGS1984 datum. For studies presenting only maps of sampling points without explicit coordinates, we approximated locations by visually interpreting the maps using GIS software, refining the coordinates based on geographical features described in the original article27. In cases where only administrative regions were mentioned, we assigned coordinates to the centroid of the specified area, unless additional descriptive information allowed for more precise and localized positioning.

To facilitate data filtering by spatial resolution, all locations were categorized into four different levels by geographic scale, i.e., national, regional, local, and site-specific. This enables dataset users to filter records according to spatial precision requirements26. The spatial distribution of PFOS and PFOA occurrences was visualized using ArcGIS 10.8 (Fig. 3). The administrative boundary demarcation map of China (2015) utilized in this study was procured from the Resource and Environmental Science Data Center (http://www.resdc.cn/).

Fig. 3
figure 3

Number of PFOS & PFOA occurrence records in different different media by provincial-level division of China. (a) water. (b) animal. (c) soil. (d) food. (e) human. (f) sediment. (g) air. (h) plant.

Data Records

The dataset is provided as a single Microsoft Excel file (.xlsx) titled “Distribution of PFOA and PFOS in Multiple Environmental Media in China”28. This file contains discrete sampling records, with each entry (row) representing the analysis of a specific medium at a particular location and time. The dataset is structured with the following 22 parameters (columns), each designed to facilitate specific types of analysis:

1. ECs_type: Type of pollutant. A categorical variable specifying the pollutant (“PFOS” or “PFOA”), used for substance-specific filtering and analysis.

2. sam_T: Sampling time, serving as the primary temporal variable for trend analysis.

3. Lon: Longitude of the sampling location (WGS1984 datum).

4. Lat: Latitude of the sampling location (WGS1984 datum).

5. loc_level: Spatial resolution of the location (1 = provincial, 2 = city, 3 = county, 4 = township), allowing for multi-scale spatial aggregation and analysis.

6. loc_l1: Provincial level information (name of province, autonomous region, municipality, or special administrative region of China).

7. loc_l2: City level information (name of prefecture-level city, autonomous prefecture, or district under a municipality).

8. loc_l3: County level information (name of county or county-level city, autonomous banner, district, or county).

9. loc_l4: Township or more detailed location.

10. ECs_conc_min: Minimum measured concentration.

11. ECs_conc_mean: Mean measured concentration.

12. ECs_conc_max: Maximum measured concentration.

13. Ecs_conc_median: Median measured concentration.

14. Ecs_conc_SD: Standard deviation of measured concentration.

15. Ecs_medium: Ecological category of pollutant occurrence.

16. Ecs_pathway: The environmental matrix (e.g., “Water”, “Soil”), which is fundamental for matrix-specific spatial and statistical analyses.

17. Detailed_pathway: A more specific description of the environmental compartment (e.g., “River water”), allowing for finer-grained sub-group analyses.

18. Ecs_unit: The unit of measurement (e.g., “ng/L”, “ng/g”), ensuring correct data interpretation and conversion.

19. ECs_methods: Analytical instrumentation and methodology, e.g., LC-MS/MS (Liquid Chromatography-Tandem Mass Spectrometry), HPLC-MS/MS (High Performance Liquid Chromatography-Tandem Mass Spectrometry), UPLC-MS/MS (Ultra Performance Liquid Chromatography-Tandem Mass Spectrometry), etc.

20. pub_t: Publication time of the literature, allowing for meta-analysis based on publication date.

21. pub_DOI: The Digital Object Identifier of the source literature, ensuring full data traceability.

22. pub_cite: The full citation of the source literature, allowing users to consult the original study for methodological and other detailed information.

Technical Validation

A total of 2653 records on PFAS detection reports from POPs were extracted from publication released between 2021 and 2025. The systematic extraction of all records was conducted during the preliminary phase by a specialized review team. After inputting the record data, a second reviewer person independently checked the entries, particularly focusing on literature with identical sampling times and geographical locations, deleting highly similar data to ensure accuracy and avoid duplication.

During the georeferencing phase, spatial coordinates were verified through a secondary examination by an independent researcher. This process included cross-checking the assigned coordinates and confirming their consistency with the descriptions in the original literature. To further ensure the reliability of the dataset, all members of the research team adhered to uniform inclusion criteria throughout the data collection and validation process.

When extracting geographical information, special care was taken to maximize precision. The original articles and any supplementary materials were carefully reviewed to accurately geo-reference sampling locations. When the geographical location descriptions in the literature were incomplete, ambiguous, or used unofficial names (such as a river in a mountain village), the semantics of the text were thoroughly analyzed to make the best possible determination of the sampling site. For records lacking precise coordinates or detailed maps of sampling points with coordinates, the spatial resolution was classified according to the available information. To enhance transparency regarding spatial precision, the “loc_level” field was included to the dataset, allowing users to filter records based on geographic resolution.

The spatial and quantitative distribution of PFOS and PFOA occurrence records from 2021–2025 are presented in Figs. 35. Figure 3 provides a spatial overview of the records, categorized by key environmental matrices. Figure 4 presents a quantitative summary of the number of records aggregated by province for both PFOS and PFOA; to enhance data visualization and facilitate comprehension, the human ecological category data in Fig. 4 has been magnified fivefold prior to presentation. Figure 5 further details the distribution of these records across different environmental media types. All spatial visualizations were generated using ArcGIS 10.8 and Python.

Fig. 4
figure 4

Provincial-level distribution of PFOS & PFOA occurrence records. (Human occurrence record counts multiplied by 5 for visualization).

Fig. 5
figure 5

Heatmap of the number of PFOS and PFOA occurrence records by province and environmental media type. Each cell in the matrix represents the total number of records for a specific province (rows) within a specific environmental medium (columns). The color intensity corresponds to the number of records, as indicated by the color bar.

The distribution pattern in our records is essentially consistent with the review by Elgarahy et al.29 on PFAS distribution, which indicates that PFAS are detected at higher concentrations and frequencies in industrially concentrated areas worldwide. We conducted a comparative analysis between our dataset and the pre-2021 dataset established by Dong et al.30 Table 1 and Table 2 present the descriptive statistics for PFOS and PFOA concentrations across diverse environmental matrices, encompassing water, soil, sediments, and biological specimens, after three-sigma outlier removal. To ensure a scientifically valid comparison for the biological specimens, the data presented are specifically restricted to human blood samples from the current dataset versus historical human serum data from the study by Dong et al. Notably, our study (2021–2024) recorded substantially higher PFOS concentrations in water (mean: 1646.949 ng/L vs. 27.74 ng/L), air (mean: 24.972 pg/m³ vs. 4.52 pg/m³), and sediment (mean: 95.597 ng/g vs. 5.08 ng/g) compared to values reported by Dong et al. (pre-2021), while human sample concentrations showed a notable decrease (mean: 7.212 ng/mL vs. 29.29 ng/mL). Conversely, PFOA exhibited a redistribution trend from water (mean: 397.755 ng/L vs.1677.95 ng/L) and air (mean: 10.492 pg/m³ vs. 30.28 pg/m³) toward soil (mean: 327.139 ng/g vs. 8.47 ng/g).

Table 1 Statistical distribution of PFOS concentrations across environmental matrix.
Table 2 Statistical distribution of PFOA concentrations across environmental matrix.

A key finding from our analysis is the apparent decreasing trend in human blood concentrations of both PFOS and PFOA during the recent 2021–2024 period when compared to historical data. This finding provides a high-resolution snapshot of the most recent dynamics within a complex, non-linear historical context. The temporal trend for PFAS in China is varied and depends on the specific compound and timeframe analyzed. For instance, while the valuable long-term (1999–2023) regression analysis by Li et al.25. suggested an increasing trend for PFOA and a stable trend for PFOS, other comprehensive reviews focusing on different periods, such as Kou et al.31., had already identified a significant decreasing trend for PFOS in Chinese adults over the 2008–2018 period. Our finding of a recent, concurrent decline of both compounds is further corroborated by the latest targeted cohort studies in China. Notably, a decade-long study on pregnant women in Shanghai by Ruonan Li et al.32. also found a significant decreasing trend in the median concentrations of both PFOS and PFOA from 2012 to 2023. These varied findings highlight that different analytical approaches can reveal different facets of a complex trend. Our findings therefore suggest that the recent 2021–2024 period may represent an inflection point in exposure trends, a hypothesis that aligns with China’s recent and intensified control measures33,34,35 and warrants continued monitoring and validation.

Usage Notes

The data from this investigation reveals that perfluorooctanoic acid (PFOA) and perfluorooctane sulfonic acid (PFOS) are widely distributed across multiple environmental compartments, with notable concentration levels found in water bodies, soil, and biota1,7. Understanding the distribution patterns of these persistent organic pollutants is crucial for developing effective regulatory frameworks and remediation strategies. The dataset presented here contributes to a more comprehensive understanding of the occurrence of PFOA and PFOS in environmental samples across different regions. This information can be used to study spatial and temporal trends of PFOA and PFOS pollution at multiple scales, and supports environmental risk assessment efforts, helping to identify potential exposure pathways8,36. The dataset’s structure facilitates diverse analyses by environmental scientists, toxicologists, regulatory agencies, and other stakeholders, supporting a wide range of research and policy applications.

It should be noted that analytical methods for detecting PFOA and PFOS have evolved considerably over time, which may affect reported concentration values37. Early studies often employed liquid chromatography-mass spectrometry (LC-MS) techniques with relatively high detection limits. Recent investigations primarily use high-performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS), capable of detection at parts-per-trillion levels38,39. With continued technological advancement, ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS) has become the preferred analytical technique for PFAS monitoring40. Compared to traditional HPLC, UPLC-MS/MS offers notable advantages, including shorter analysis time, higher chromatographic resolution, improved sensitivity, and reduced solvent consumption41,42. This technological progress has enabled detection limits at sub-parts-per-trillion levels and allowed for simultaneous quantification of multiple PFAS compounds in complex environmental matrices43.

Advancement have also been made in sample preparation, with pressurized liquid extraction (PLE) and solid-phase extraction (SPE) methods now widely adopted to enhance recovery from complex environmental samples44. These methodological improvements have enhanced the accuracy and reliability of PFOS and PFOA quantification, particularly in challenging matrices such as soil and sediment.

Users of this dataset are encouraged to consider the evolving nature of analytical methodologies when interpreting historical data and comparing results across studies. Furthermore, it is critical for users to note that the scope of this dataset is intentionally focused on PFOS and PFOA. This strategic focus ensures maximum data coverage and reliability for these two dominant and regulated pollutants. Consequently, while this dataset represents the contamination status of PFOS and PFOA, it should not be used to infer trends or environmental risks for the entire class of PFAS compounds, including numerous emerging alternatives.

Beyond its utility for fundamental trend analysis, this dataset is designed to serve as a foundational resource for advanced studies that translate scientific data into actionable insights. Its high-resolution, georeferenced data can be used to evaluate the effectiveness of national control measures, such as China’s “List of New Pollutants for Priority Control.” and to help regulatory agencies identify priority regions for intervention. It also provides the essential quantification of contamination required for modeling the costs of various remediation technologies45, conducting cost-benefit analyses of different management strategies, and assessing the socioeconomic impacts of pollution hotspots.