Introduction

Air pollution poses a serious health problem in upper northern Thailand, particularly in Chiang Mai Province1. Between December and April, Chiang Mai’s air pollution ranks among the worst in the world due to elevated levels of ambient particulate matter (PM) during the burning and haze seasons2. Prolonged exposure to PM has been linked to various health issues, including lung cancer (LC), neurological disorders, respiratory diseases, and cardiovascular conditions3. Moreover, indoor radon (222Rn) is another critical risk factor affecting the general population’s health4,5. Previous studies have confirmed that indoor radon activity concentrations during the burning season in Chiang Mai are significantly higher than during the non-burning season6,7.

Radon and its progeny are classified as Group 1 human carcinogens by the International Agency for Research and Cancer (IARC)8. They are the second leading cause of LC after cigarette smoking and the primary cause of LC in non-smokers9. Radon is colorless, odorless and invisible gas with a longer half-life of 3.82 days originating from the decay of radium (226Ra) and emits high linear energy transfer (LET) radiation, such as α particles9,10. When α particles enter the human body, they can cause deoxyribonucleic acid (DNA) damage by inducing the overproduction of reactive oxygen species (ROS), potentially leading to cancer, and other diseases4,10. Radon is the main source of natural background radiation for public health. It is found in natural gas (air), water, soil, rocks, fuels and building materials8−10. Previous studies indicate that Chiang Mai has higher radon activity concentrations than other provinces in Thailand, making it critical to study the long-term health effects of indoor radon exposure in this high air pollution area6,11.

Indoor radon exposure is linked to both cancer and non-cancer diseases12. Cancers include LC, childhood leukemia, stomach cancer, kidney cancer, liver cancer, melanoma, non-Hodgkin’s lymphoma, multiple myeloma, and brain tumors. Non-cancer diseases include chronic obstructive pulmonary diseases (COPD), congenital malformations, neurodegenerative diseases, urinary tract diseases, chronic bronchitis, and cardiovascular and respiratory diseases12,13,14. Our findings suggest that short telomere length, proteomic profiles (PARP1, WT1, TRERF1, and NPLOC4), and serum biomarkers (CEA and Cyfra21-1) may serve as potential LC biomarkers in individuals with high residential radon exposure15,16,17. Similarly, long-term high-level radon exposure alters the genome-wide DNA methylation profile, with TIMP2, EMP2, CPT1B, AMD1, and SLC43A2 identified as potential inducers of LC18. However, there are no studies on specific biomarkers associated with diseases other than LC caused by indoor radon exposure in high-radon areas.

Metabolomics is the comprehensive study of metabolites within cells, tissues, biofluids or organisms. Recently, it has emerged as a powerful approach for identifying potential biomarkers that can aid in the early detection and diagnosis of disease. As a non-invasive diagnostic technique, metabolomics offers high sensitivity and specificity. This approach relies primarily on advanced techniques such as mass spectrometry (MS) and nuclear magnetic resonance (NMR) spectroscopy techniques19. However, to our knowledge, no studies have identified potential disease biomarkers associated with high residential radon exposure through metabolomic approaches.

This study utilized ultra-high performance liquid chromatography coupled with high-resolution mass spectrometry (UHPLC-HRMS) to investigate serum metabolic biomarkers in individuals from areas with low-and high-radon exposure in high PM regions of Chiang Mai. All participants were non-smokers. We analyzed the differential metabolites between the low-and high-residential radon groups using multivariate statistical analysis, including principal component analysis (PCA) and partial least squares discriminant analysis (PLS-DA) models, as well as receiver operating characteristic (ROC) curve analysis. Additionally, we identified enriched metabolic pathways based on the Kyoto Encyclopedia of Genes and Genomes (KEGG) database.

Materials and methods

Study area

Kong Khaek is a subdistrict of the Mae Chaem district, located in the southern part of Chiang Mai province. The area is one of the most highly polluted areas in Chiang Mai and is characterized by forested landscapes and granite highland mountains. It consists of 12 villages with a total population of 6572 people (2304 dwellings) as of 2023. Indoor radon activity concentrations were measured using a passive radon-thoron discriminative monitor (RADUET) between September 2022 and March 2023. The indoor radon activity concentrations ranged from 18.5 to 119 Bq/m3 with an average value of 40.8 ± 22.6 Bq/m3. The average indoor radon activity concentration in the area is higher than both the national and global average7. Based on these measurements, we categorized the participants into three groups according to the locations of their dwellings; low (< 30 Bq/m3), moderate (30–50 Bq/m3) and high (> 50 Bq/m3).

Participants and sample collection

The transitional study was conducted on a selected group of Kong Khaek residents who had lived in the area for at least 15 years. A total of 85 participants including 45 from the low-residential radon exposure group and 40 from the high-residential radon exposure group, were enrolled in this study. The inclusion criteria were as follows: (Ι) age between 18 and 80 years; (ΙΙ) no prior radiation or chemotherapy; (ΙΙΙ) nonsmokers. The exclusion criteria were as follows: (Ι) pregnancy; (ΙΙ) a history of other cancers. Finally, 15 participants from each group (matched by gender and age) were recruited for further analysis (Table 1).

Table 1 Characteristics of the study populations.

This study was granted approval by the Human Research Ethics Committee at the Faculty of Medicine, Chiang Mai University (research ID: 8613 and approved on 5 July 2022). All participants were informed about the purpose of the study and provided written consent before completing a questionnaire and providing blood samples following the measurement of indoor radon activity concentrations. The questionnaire used in this study was specifically developed by our research team to address key topics such as smoking history, alcohol consumption, exposure to air pollution, dietary habits, family history of cancer and occupational history. Fasting peripheral venous blood (10 mL) was collected early in the morning by a nurse and the serum samples were isolated through centrifugation and stored at −80 °C for further analysis.

Sample preparation

Serum samples were thawed and diluted of 100 mg/L 4-Chloro-L-phenylalanine dissolved in 70% methanol from Sigma-Aldrich (St. Louis, MO, USA). The mixture was vortexed at 37 °C for 30 s and then centrifuged at 4 °C, 13,000 × g for 10 min. Subsequently, the supernatant was transferred to a clean tube and dried under vacuum. The dried samples were redissolved by adding 10% (v/v) acetonitrile and filtered through a 0.22 μm hydrophilized polytetrafluoroethylene (HPTFE) membrane for UHPLC-HRMS analysis.

Ultra-high-performance liquid chromatography coupled with high-resolution mass spectrometry (UHPLC-HRMS)

The UHPLC-HRMS analysis was performed using an UHPLC system (Vanquish; Thermo Fisher Scientific, Inc.; Waltham, MA, USA) coupled with a Q Exactive™ HF−X Quadruple−Orbitrap mass spectrometer system (Thermo Fisher Scientific, Inc.; Waltham, MA, USA). A Hypersil GOLD™ Vanquish C18 column (1.9 μm, 2.1 mm × 100 mm, Thermo Fisher Scientific, Inc.; Waltham, MA, USA) was used for separation serum samples. The column temperature was set at 30 °C, with the flow rate of 0.3 mL/min and an injection volume of 2 µL. Mobile phase A consisted of 0.1% (v/v) formic acid in water, while mobile phase B was acetonitrile containing 0.1% (v/v) formic acid. The gradient elution was as follows: 0–1 min, 2% B; 1–18 min, 2–100% B; 18–20 min, 100% B; and 20–25 min, 100–2% B. The quality control (QC) samples were prepared by combining equal volumes of the serum samples to monitor stability and repeatability throughout the analysis.

The MS detection was conducted by the Q-Exactive™ HF−X Orbitrap mass spectrometer with a heated electrospray ionization source (HESI) ion source (Waltham, MA, USA). Positive ions were detected using full-scan MS1/data-dependent MS2 (dd-MS2) mode with the following specific parameters: full-scan MS1 resolution at 120,000; dd-MS2 resolution at 30,000; mass range from 70 to 1000 m/z; auxiliary gas heater temperature at 320 °C; capillary temperature at 320 °C; maximum inject time at 100 ms; automatic gain control target at 3 × 106; stepped N(CE) at 10, 20 and 40 eV; sheath gas at 45 arbitrary units (AU); auxiliary gas at 10 AU; sweep gas at 5 AU and spray voltage, either 3.5 kV (positive) or 2.5 kV (negative)20.

Data processing and statistical analysis

The raw data were processed using Compound Discover 3.3 software (Thermo Fisher Scientific, Inc.). Normalization was performed using QC samples to identify differential metabolites. All acquired MS data were searched against the Human Metabolome Databases (HMDB) for metabolite annotation, utilizing mzVault, mzCloud and ChemSpider.

All data are presented as mean ± standard deviation (SD). Statistical analysis of the UHPLC-HRMS data was conducted using MetaboAnalyst 6.0. Difference between groups were analyzed using the Student’s t-test. Pvalue less than 0.05 were considered statistically significant. Differences in serum metabolic between groups were analyzed using PCA and PLS-DA models. The quality of PCA and PLS-DA models was evaluated using Q2(predictive parameter) and R2 (explanatory parameter) values. The PLS-DA model was validated using a permutation test with 200 cycles performed in SIMCA 18.0 (Umetrics, Sweden). Differential metabolites between groups were selected based on the following criteria: fold change (FC) > 1 or < 0.5, the variable importance in the projection (VIP) > 1 and a Pvalue < 0.05. ROC analysis and area under the curve (AUC) calculations were performed using MetaboAnalyst 6.0 to identify disease biomarkers associated with high residential radon exposure. A biomarker was considered to have high diagnostic value if the AUC ≥ 0.85. Metabolic pathways and metabolite set enrichment analyses were performed using the KEGG database21.

Results

Characteristics of the participants

The study included 30 healthy participants, comprising 16 males and 14 females, with a mean age of 61.2 ± 11.9 years. Table 1 shows the characteristics of the study population from the Kong Khaek subdistrict, including age, gender, education and alcohol consumption. The participants were divided into two groups: 15 in the low-residential radon exposure group and 15 in the high-residential radon exposure group. All participants were non-smokers and had lived in the study areas for at least 15 years. There were no significant differences in age, gender, education, or alcohol consumption between the two groups.

Metabolomic profiling of serum samples from the low- and high-residential radon exposure groups

All serum samples were analyzed using the UHPLC-HRMS technique, identifying a total of 449 differential metabolites between the two groups. The metabolites were distinguishable using the PLS-DA model (Q2 = 0.87, R2 = 0.92) but not with the PCA model (Fig. 1). This indicates that the PLS-DA model is more effective for screening differential metabolites between the low- and high-residential radon exposure groups. To validate the PLS-DA model, a permutation test with 200 cycles was conducted (Fig. 1C). The intercepts of R2and Q2 were 0.43 and − 0.58, respectively, confirming the model’s good predictive ability and reliability without evidence of overfitting.

Fig. 1
figure 1

Multivariate statistical analysis. Score plots for the PCA (a) and PLS-DA (b) models comparing the low- and high-residential radon exposure groups. The validation performance of the PLS-DA model (c).

Potential biomarker identification

Using the PLS-DA model to investigate differences in serum metabolomic profiles between the low-and high-residential radon exposure groups, we identified the metabolites distinguishing the two groups. A total of 222 metabolites were identified between the low- and high-residential radon exposure groups of which 67 were up-regulated and 155 were down-regulated (Fig. 2a, b). These metabolites were identified based on the selection criteria (VIP > 1, FC > 1 or < 0.5 and P < 0.05), a total of 92 differential metabolites were identified, with 49 upregulated and 43 downregulated in the high-residential radon exposure group compared to the low-residential radon exposure group (Supplemental Table S1). Furthermore, Fig. 2c highlights the top 20 most dysregulated serum metabolites in the high-residential radon exposure group compared with those in the low-residential radon exposure group.

Fig. 2
figure 2

(a) Heatmap of differential metabolites between the low- and high-residential radon exposure groups. (b) Volcano-plot of metabolites identified between the low- and high-residential radon exposure groups. (c) The top 20 important metabolites identified in the variable importance in projection (VIP) plot from the PLS-DA model.

Validation of biomarker metabolites

To validate the significance of metabolic markers as potential disease biomarkers in the high-residential radon exposure group compared to the low-residential radon exposure group, ROC analysis was performed, and the AUC values were used to assess the significance of these metabolites. Twelve differential metabolites with high diagnostic power and predictive accuracy (AUC ≥ 0.85) were identified and selected, of which 5 were upregulated and 7 were downregulated (AUC = 0.85−0.97, 95% CI: 0.76−0.99), including 2-methylnaphthalene, triethyl phosphate, 2,3-octadiene-5,7-diyn-1-ol, 12a-hydroxy-3-oxo-4,6-choladien-24-oic acid, vanillin, cyprodenate, D-sphingosine, N-undecanoylglycine, benzylideneacetone, meperidine, 3-methylhistidine and 3-methyloxindole. These findings propose these metabolites as potential disease biomarkers for the high-residential radon exposure group (Table 2; Fig. 3).

Fig. 3
figure 3

Relative abundance of 12 significantly altered metabolites in the high-residential radon exposure group compared with the low-residential radon exposure group (P < 0.05).

Table 2 Differential metabolites in the serum were identified when comparing the high-residential radon exposure group to the low-residential radon exposure group.

Biomarker metabolic pathway analysis

Metabolic pathways associated with the potential metabolic markers were analyzed using KEGG pathway enrichment analysis, with significantly enriched pathways selected based on P < 0.05. A total of 12 potential biomarker metabolites were identified within key metabolic pathways as potential disease biomarkers for high-residential radon exposure. Figure 4a; Table 3 present the top 25 metabolic pathways identified in the high-residential radon exposure group compared to the low-residential radon exposure group. Notably, D-sphingosine emerged as the key compound across all identified metabolic pathways. KEGG pathway enrichment analysis further identified diseases associated with the high-residential radon exposure group based on blood sample data (Fig. 4b; Table 4). Four major diseases were significantly (P < 0.05) identified in the high-residential radon exposure group compared to the low-residential radon exposure group. Decreased levels of D-sphingosine (AUC = 0.95, specificity = 89%, and sensitivity = 91%) were linked to the development of LC, while reduced levels of 3-methylhistidine (AUC = 0.86, specificity = 76%, and sensitivity = 80%) were associated with kidney disease, early preeclampsia, and Alzheimer’s disease (Fig. 3g, k; Table 2). These findings suggest that D-sphingosine and 3-methylhistidine could serve as potential disease biomarkers associated with long-term exposure to high levels of indoor radon.

Fig. 4
figure 4

Metabolite set enrichment analysis: The top 25 metabolic pathways distinguishing the high-residential radon exposure group from the low-residential radon exposure group (a) and diseases associated with metabolites in the high-residential radon exposure group compared with the low-residential radon exposure group (b).

Table 3 The top 25 metabolic pathways distinguishing the high-residential radon exposure group from the low-residential radon exposure group.
Table 4 Diseases associated with metabolites in the high-residential radon exposure group compared to the low-residential radon exposure group.

Discussion

Long-term, large-scale epidemiological studies have shown that individuals living in areas with high levels of natural radiation over extended periods may have an increased risk of cancer and other diseases4,12−14]. Understanding the health risks associated with chronic low-dose exposure to indoor radon in human populations is therefore crucial. However, few studies have specifically investigated the health effects in high-radon areas, particularly with respect to disease-associated biomarkers. To our knowledge, this study presents the first serum metabolomics profile of low-and high-residential radon exposure groups in Chiang Mai Province using UHPLC-HRMS techniques. In the PLS-DA model, we observed clearer separation of metabolites between low-and high-residential radon exposure groups compared to the PCA model (Fig. 1). Using the PLS-DA model, we identified 92 significantly different metabolites (VIP > 1, FC > 1 or < 0.5, P < 0.05, Supplemental Table S1), and selected 12 potential biomarkers associated with elevated radon exposure using ROC analysis (AUC ≥ 0.85, Fig. 3; Table 2). Notably, KEGG pathway analysis highlighted D-sphingosine as a key component of sphingolipid metabolism (Fig. 4a; Table 3), significantly reduced (P = 2.8 × 10⁻¹¹) in the high residential radon exposure group compared to the low residential radon exposure group (Fig. 3g; Tables 2 and 3). Further analysis of disease signatures identified D-sphingosine and 3-methylhistidine as contributing to a high risk of diseases in high radon areas, including LC, kidney diseases, early preeclampsia, and Alzheimer’s disease (Fig. 4b; Table 4).

Membrane lipids such as sphingolipids, including sphingosine, sphingosine-1-phosphate, and ceramide, play critical roles in the development of carcinogenesis. Sphingosine, derived from ceramide, is subsequently phosphorylated into sphingosine-1-phosphate by the action of sphingosine kinases22. Sphingosine is a key regulator of cellular functions, including the growth of tumors, cell division, metastasis, immune activity, and apoptosis22,23. Reduced sphingosine levels have been suggested to correspond to an increased risk of LC24−26. A review of the literature indicates that lower sphingosine levels are associated with disruptions in the sphingosine kinase pathway and sphingosine-1-phosphate signaling24,25. Furthermore, studies have reported significantly lower sphingosine levels in LC patients compared to healthy individuals26. Our previous research demonstrated that D-sphingosine levels are significantly lower (P< 0.05) in the serum of LC patients compared to individuals from both low-and high-residential radon exposure groups, consistent with these findings20. In this study, D-sphingosine levels were also significantly lower (P = 2.8 × 10⁻¹¹) in the high-residential radon exposure group compared to the low-residential exposure group, suggesting a higher risk of LC in high-radon areas (Fig. 3g). This increased risk may be associated with alterations in sphingolipid metabolism, potentially linked to tumor development or progression22,23. Specifically, altered sphingosine levels may mediate this association by linking indoor radon exposure to LC development (Fig. 4b; Table 4). We propose that changes in sphingosine levels may contribute to oxidative stress, DNA damage, and subsequent carcinogenic processes, offering a mechanistic explanation for the observed risk27,28. However, further studies are necessary to elucidate the mechanisms underlying these alterations.

3-Methylhistidine is an amino acid residue formed through the post-translational methylation of histidine, primarily found in tissues such as skeletal muscle, particularly in actin and myosin29. It is widely recognized as a biomarker for muscle metabolism and the breakdown of muscle proteins30,31. In this study, 3-methylhistidine levels were significantly lower (P = 8.4 × 10−10) in the high-residential radon exposure group compared to the low-residential exposure group, indicating that alterations in 3-methylhistidine levels may serve as an indicator of the likelihood of developing kidney diseases, early preeclampsia, and Alzheimer’s disease (Figs. 3k and 4b; Table 4).

Altered levels of 3-methylhistidine have been associated with kidney diseases, including chronic kidney disease (CKD), and nephropathy31,32. Although detailed data on 3-methylhistidine levels between low- and high-residential radon exposure groups in relation to kidney diseases are limited, our findings suggest that these alterations may result from changes in skeletal muscle metabolism, likely caused by damage from prolonged indoor radon exposure29−32. Consequently, altered 3-methylhistidine levels may serve as a more effective marker for screening kidney disease in high-residential radon exposure groups compared to low-residential radon exposure groups (Fig. 3k).

Preeclampsia is a hypertensive pregnancy disorder characterized by high blood pressure and organ dysfunction, particularly in the liver and kidneys33. Early preeclampsia, which occurs before 34 weeks of gestation, poses significant risks for both the mother and fetus33,34. Despite its critical impact, no reliable screening tests currently exist to identify pregnancies at risk for preeclampsia. Altered levels of 3-methylhistidine in the high-residential radon exposure group compared to the low-residential radon exposure group (Fig. 3k) suggest that 3-methylhistidine levels could serve as a potential indicator of preeclampsia, given their association with muscle metabolism and protein breakdown, potentially resulting from damage caused by prolonged indoor radon exposure33,34.

Furthermore, chronic exposure to indoor radon can have adverse effects on neurodegenerative diseases including Alzheimer’s disease12,35. Alzheimer’s disease is a neurodegenerative disorder characterized by the progressive loss of cognitive functions, including memory, thinking, and reasoning abilities35. Radon and its decay products can cause potential damage through free radical generation, particularly in the high lipid content of the brain and nervous system35,36. As a result, these decay products have been detected in the amygdala and hippocampus of Alzheimer’s disease patients37. Alterations in amino acid levels, including 3-methylhistidine, have been associated with Alzheimer’s disease38. Fonteh et al. reported significant differences in 3-methylhistidine levels between Alzheimer’s disease patients and healthy controls39. In this study, we found a significant difference (P = 8.4 × 10−10) in 3-methylhistidine levels between the low-and high-residential radon exposure group (Fig. 3k), suggesting that the variation in 3-methylhistidine levels could serve as an indicator of Alzheimer’s disease in individuals residing in high-radon areas. However, limited studies have investigated the role of 3-methylhistidine in Alzheimer’s disease, especially among individuals with high residential radon exposure. Further investigations are needed to better understand its contribution to these conditions.

The strength of our study lies in being the first to explore biomarkers linked to diseases in high-residential radon exposure populations, focusing on individuals with at least 15 years of residency. By employing metabolomics, a non-invasive and highly sensitive diagnostic tool, the study emphasizes its critical role in early disease detection. Notably, another strength is focus on a non-smoking population residing in areas where radon measurements were conducted over a six-month period, providing a direct investigation of the health effects linked to indoor radon exposure. Furthermore, the metabolomic findings reveal significant differences between the low-and high-residential radon exposure groups, suggesting that D-sphingosine and 3-methylhistidine may serve as potential biomarkers for related diseases. However, this study has several limitations. First, the relatively small sample size limits the generalizability of the results, and a larger sample would be beneficial to validate these findings. Second, all participants resided in areas with high PM levels, which could have a synergistic effect on the results and potentially confound the specific impact of indoor radon exposure. Third, the underlying mechanisms of diseases related to the D-sphingosine and 3-methylhistidine pathways in the high-residential radon exposure group remain unclear. Fourth, a significant limitation of metabolomics lies in its inability to establish direct causal links between identified metabolites and disease outcomes, as it primarily reflects alterations in metabolic profiles rather than uncovering the underlying biological mechanisms. Further research is necessary to confirm these findings and to investigate the associated mechanisms in more detail.

Conclusion

In summary, this study highlights the potential of metabolomic analysis to identify metabolic markers that differentiate between low-and high-residential radon exposure groups and to explore biomarkers associated with diseases in areas with elevated radon levels. Our findings suggest that D-sphingosine and 3-methylhistidine are promising serum metabolites for screening high-risk individuals with long-term radon exposure. Moreover, this research provides a foundation for the identification of novel biomarkers in future studies focusing on high-radon exposure areas.