Abstract
Current diagnosis of nasopharyngeal carcinoma (NPC) mainly relies on detection of plasma Epstein-Barr virus DNA or nasal endoscopy. However, trace metals or other elements may have critical roles in the pathophysiology of NPC. In this pilot study, blood plasma samples from 93 NPC patients and 30 healthy control were prepared by alkali dilution method, and metal contents were analysed quantitatively by inductively coupled plasma-mass spectrometry. We then built six machine learning (ML) algorithms based on the element concentrations in blood plasma and evaluated the predictive performance by the area under the receiver operating characteristic curve (AUC). SHapley Additive exPlanations was employed to interpret the prediction results and explain the contribution of each variable to the model. Compared to the healthy control group, patients with NPC were characterised by increased tin (p < 0.01) and reduced in nickel and iron (p < 0.05), phosphorus (p < 0.01), magnesium, manganese, cobalt, zinc, strontium, molybdenum, antimony, barium, thallium and lead (p < 0.001) concentrations in the plasma. Among the ML models, the bagging model demonstrated the most promising performance in discriminating NPC patients with AUC of 0.999 in testing sets. We further recruited 15 patients with esophageal squamous cell carcinoma (ESCC), 15 non-cancer patients, and used them as blind testing samples. The model can successfully identify NPC patients from those samples with AUC and specificity of 0.857 and 0.800. In summary, the present pilot study highlights the use of metallomics analysis combined with machine learning in NPC identification, especially in early-stage cancer prediction.
Introduction
Nasopharyngeal carcinoma (NPC) is an epithelial tumour arising from the nasopharyngeal mucosal lining. In 2022, there were 120,416 new cases of NPC globally, and more than 80% of these were found in Asia, showing the geographical global distribution is unbalanced. Moreover, the age-standardized NPC incidence rate for males (7.2 per 100,000) in Southeast Asia is higher than for females (2.3 per 100 000) with a ratio of about 3:11,2. Although the cases of NPC only accounted for 0.6% of all cancer diagnosed in 2022 globally, it was the most frequent malignant tumour among the head and neck cancers2,3. The current diagnostic method for NPC relies on full evaluation with primary biopsy and measurement of pre-treatment blood plasma Epstein-Barr virus (EBV) DNA levels. Due to the asymptomatic nature of NPC, more than 70% of patients are diagnosed with locally advanced disease. Approximately 30–40% of these patients eventually develop distant metastasis post-treatment, with some facing a 5-year survival rate of around 60% to 85%4,5,6,7. Early diagnosis of NPC is currently challenging. As for risk factors, many are implicated including host genetics, living habits and environmental exposure, may also contribute to the development of NPC8. Therefore, novel biomarkers need to be identified to help increase the early diagnosis rate and improve prognosis outcomes.
Metallomics is the study of the composition, distribution and regulation of trace elements and metal-containing compounds in biological systems9. Elemental distribution, metabolism and homeostasis in the body may be influenced by the onset and progression of cancer. Deficiency of certain elements, especially trace elements, or changes in functions of metalloproteins may raise the risk for cancer9. Conversely, high levels of environmental exposure to heavy metals may initiate carcinogenesis and progression of cancer10. With the study of metallomes, the relationship between essential elements and trace elements and cancer can be analysed in order to elucidate the mechanism of cancer and, perhaps, to diagnose. Multiple studies have investigated changes in the elemental content of biological samples with the occurrence and development of NPC11,12,13,14. Man et al. found that lower levels of Sr and higher levels of As, Fe and V were present in the scalp hair of NPC patients as compared to healthy people11. Other studies found higher concentrations of As, Cd, Cr and Ni in cancerous tissue as well as in the blood of NPC patients than those in healthy control12,13. Ge et al. found that the median concentrations of serum Co, Ni, Cr, Mn and Cd in NPC patients were higher than in the control group, while serum As and Mo median concentration values were lower in NPC patients14. Also, it is reported that elevated levels of Cd and low levels of Mn in serum were positively associated with NPC risk. Although there are numerous studies about the correlation between certain trace elements or metals and NPC, comprehensive and comparative evaluations on the plasma metallomics profiles of NPC patients are still lacking.
With the rise of artificial intelligence development, machine learning (ML) and deep learning (DL) models have been widely used in oncology for cancer diagnosis and prognosis15. Supervised learning is a type of ML in which the algorithm is trained using labelled input and output. The algorithm can extract the key features and learn from this complex, large and high-dimension training data; thus, it can subsequently analyse new data to make predictions or determine outcomes16,17. The most common approach in NPC diagnosis is to extract the features of the endoscopic images or pathological slides of NPC tumours and construct a diagnostic model to predict NPC based on tissue features; examples of such models are support vector machine (SVM), k-nearest neighbour (kNN), random forest (RF), artificial neural network (ANN) and deep neural network (DNN)18,19. However, endoscopic images and/or slides require invasive procedures; liquid biopsy analysis (e.g., blood or urine analysis) is more cost-effective, less invasive than endoscopic or pathological evaluation for initial screening, and more attractive to both patients and clinicians. To date, numerous studies have utilised metal distribution in different liquid biopsies together with ML models to diagnose lung, thyroid and bladder cancers20,21,22. However, little is known about the use of metallomics in discriminating NPC. Nevertheless, the relationship between specific cancer and metals is ambiguous due to the lack of consensus among different ML studies. Also, most predictions of ML models are not transparent and not interpretable. Hence, it is hard to implement ML in clinical practice. To solve this “black box” problem, SHapley Additive exPlanations (SHAP) has been utilised to explain the prediction and interpret the relationship between the clinical or elemental features and diagnosis results23,24. It offers a method for interpreting ML model predictions by assigning each feature a value that reflects its contribution to the prediction, based on the Shapley value from cooperative game theory25.
While metallomics has been used with other cancers, little use has been made of metallomics in discriminating NPC. In this pilot study, we aimed to analyse and compare differences in the elemental profiles of blood plasma of healthy controls and NPC patients. We created various models, which used the metallome data to predict the risk and classify the early stage of NPC. The accuracy of the various models was assessed and compared. SHAP analysis was further used to evaluate the potential connection between elements and the occurrence of NPC.
Method
Study population
A total of 168 subjects participated in this study: 30 healthy individuals (control group), 108 patients with nasopharyngeal carcinoma (NPC), 15 patients with esophageal squamous cell carcinoma (ESCC), and 15 hospital patients without any form of cancer. The whole blood samples of the healthy people were obtained from the Hong Kong Red Cross. The whole blood samples of NPC and ESCC patients were collected at Queen Mary Hospital in Hong Kong from 2006 to 2015 and 2000 to 2012, respectively. The blood plasma samples of non-cancer patients were collected at five Hong Kong hospitals (Hong Kong Queen Elizabeth Hospital, Tuen Mun Hospital, Pamela Youde Nethersole Eastern Hospital and Princess Margaret Hospital) from 2010 to 2017. The study protocol was approved by the Hospital Institutional Review Board (REC-22-23-0133). Written informed consent was obtained from all participants. The study was conducted in accordance with the ethical principles of the Declaration of Helsinki. The clinical characteristics of the subjects are listed in Supplementary Table S1.
Sample preparation and instrumentation
Blood plasma samples were collected following centrifugation at 1,500 × g for 10 min at 4 °C, in accordance with the established protocol26. The alkali dilution was performed using a modified method, as reported in other studies27,28,29,30. An aliquot (50 µL) of plasma sample was diluted in a ratio of 1:50 with an alkaline diluent consisting of 1% (v/v) 1-butanol, 0.1% (w/v) EDTA, 0.05% (v/v) Triton X-100 and 1% (v/v) NH4OH. The sample solution was further diluted with the alkaline diluent if the metal concentration exceeded the calibration range. The blanks and CRM were prepared in the same manner as the samples.
The concentration of 24 elements (Mg, Al, P, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, As, Se, Sr, Mo, Rh, Ag, Cd, Sn, Sb, Ir, Ba, Tl and Pb) in plasma was determined by an Agilent 8900 ICP-MS Triple Quad system equipped with an SPS 4 Autosampler (Agilent Technologies, USA). The operating conditions are shown in Supplementary Table S2. For external calibration, multi-element standard solutions were prepared using a 1000 mg/L of stock solution with alkaline diluent. Standard curves were established using nine concentration points for P (0, 0.5, 1, 5, 10, 50, 100, 200, 500 µg/L) and the rest of the elements (0, 0.05, 0.1, 0.5, 1, 5, 10, 20, 50 µg/L). The linear correlation coefficient was higher than 0.999 for each metal. The internal standard (IS) solution was prepared using 10 mg/L Rh and Ir standard solutions with the alkaline diluent to a concentration of 50 µg/L. IS solution was added online at a flow rate of 30 µL/min by the peristaltic pump. ICP-MS used liquefied argon gas (Hong Kong Oxygen & Acetylene, Hong Kong) of 99.999% purity for daily operation.
Statistical analysis
Metal intensities were acquired by the MassHunter Workstation Software for ICP-MS (version C.01.05 G7201C, Build 588.15 Patch 2, Agilent Technologies, Inc. 2020, Tokyo, Japan). The metal concentrations in the blood plasma samples and the recoveries of the CRM were calculated using Microsoft Excel. The recoveries of the CRM are tabulated in Supplementary Table S3. The method limit of detection (LOD) was calculated by determining three times the standard deviation of measurement of metal concentration in seven calibration blanks, then dividing the number by the slope of the calibration curve (Supplementary Table S3). For the elements with calculated concentrations below method LOD, a value of 1/2 LOD was assigned for further statistical calculation. MATLAB R2024b software was used to perform the Spearman correlation analysis, which aimed to determine the correlation between any two metals. Additionally, the Mann-Whitney U test was utilized to study the association between metals and the risk of NPC. Statistical significance was defined when a two-tailed p-value < 0.05. In the comparison of blood plasma elemental levels across different sex and age groups, the Bonferroni correction method was used, and the significance threshold was set at p-value < 0.0167.
Machine learning models and interpretation
ML analysis was performed by MATLAB R2024b software. For each patient sample, elements with detection rate \(\:\ge\:\)90% and patient age at diagnosis were input as continuous variables, while sex was input as a categorical variable. Six supervised learning algorithms were used to develop models for NPC prediction; they were: adaptive boosting (AdaBoost), Random UnderSampling and boosting (RUSBoost), bootstrap aggregation (bagging), discriminant analysis (DA), SVM, and kNN. The data of the 30 healthy control and 93 NPC cases was split into a training data set consisting of 82 subjects (67%) and a test data set consisting of 41 subjects (33%). To further assess the generalizability of our model, we constructed an external validation set comprising 45 individuals: 15 with NPC, 15 with ESCC, and 15 non-cancer patients. The algorithms were trained on the training set using ten-fold cross-validation (10-CV). The performance of trained models was evaluated on the test data set and external validation set, using the following metrics: accuracy, precision, sensitivity, specificity, and F1 score. The area under the receiver operating characteristic curve (AUC) was used to measure the trade-off between sensitivity and specificity and to evaluate the accuracy of a binary classification model. An AUC and other metrics values \(\:\ge\:\) 0.7 are considered as clinically useful31. To provide insights into the importance of the feature, SHAP was employed in MATLAB to quantify the contribution of each type of input data to the model’s prediction, i.e., the contribution of each trace metal to the NPC outcomes.
Results
Metallomics profiling of plasma
The elemental profiles of the healthy control and NPC group appear in Fig. 1A and V. The median concentration of plasma Sn (p = 0.002) of the NPC group was significantly higher than that of the healthy control group (Fig. 1R), while the median concentrations of plasma Mg (p < 0.001), P (p = 0.003), Mn (p < 0.001), Fe (p = 0.015), Zn (p < 0.001), Co (p < 0.001), Ni (p = 0.018), Sr (p < 0.001), Mo (p < 0.001), Sb (p < 0.001), Ba (p < 0.001), Tl (p < 0.001) and Pb (p < 0.001) were significantly lower in the NPC group (Fig. 1A, C, E, I, L, O, S and V). Eight elements, namely Al (Fig. 1F), V (Fig. 1G), Cr (Fig. 1H), Cu (Fig. 1D), As (Fig, 1 M), Se (Fig. 1N), Ag (Fig. 1P) and Cd (Fig. 1Q), showed no differences between the two cohorts (p \(\:\ge\:\) 0.05). We further compared the elemental profile of the healthy control group to the NPC group at early-stage (Stages I and II) and advanced-stage (Stages III and IV) (Fig. 2). Distinct decreases in Mg (p < 0.001), P (p = 0.002), Fe (p = 0.001), Zn (p < 0.001), Mo (p = 0.001) and Sn (p = 0.002) were found in the plasma elemental profile of advanced-stage NPC, but not in that of early-stage NPC compared to the healthy control (Fig. 2A, D, O and R). Moreover, there are no statistical differences between the elemental profiles of NPC patients at the early and advanced stages except for Fe (Fig. 2C), with a p-value of < 0.001.
Comparison of plasma elemental profiles in the healthy control and NPC group. The elemental levels of Mg (A), P (B), Fe (C), Cu (D), Zn (E), Al (F), V (G), Cr (H), Mn (I), Co (J), Ni (K), Sr (L), As (M), Se (N), Mo (O), Ag (P), Cd (Q), Sn (R), Sb (S), Ba (T), Tl (U), Pb (V) was compared between healthy controls (n = 30) and NPC patients (n = 93). Asterisk indicates statistical significance: *p \(\:<\) 0.05; **p \(\:\le\:\) 0.01; ***p \(\:\le\:\) 0.001.
Comparison of plasma elemental profiles among healthy controls, early-stage and advanced-stage cancer patients. The elemental levels of Mg (A), P (B), Fe (C), Cu (D), Zn (E), Al (F), V (G), Cr (H), Mn (I), Co (J), Ni (K), Sr (L), As (M), Se (N), Mo (O), Ag (P), Cd (Q), Sn (R), Sb (S), Ba (T), Tl (U), Pb (V) was compared between healthy controls (n = 30), early-stage NPC patients (n = 18) and advanced-stage NPC patients (n = 75). Asterisk indicates statistical significance after Bonferroni correction: *p \(\:<\) 0.0167.
Taking into account that an individual’s metallomics profile can be influenced by age and sex, we further investigated the blood plasma elemental concentrations in patients in different age and sex groups, and the results are shown in Supplementary Fig. S1 and S2, respectively. No significant changes are observed in blood plasma elemental concentrations of healthy control and NPC patients of the different sex or age groups. However, significant alterations in the median concentrations of plasma Mn, Sr, Mo, Sb and Ba (Supplementary Fig. S1I, S1L, S1O, S1S and S1T) were observed between males and females in the healthy control and NPC groups (p < 0.0167). Interestingly, we discovered that males in the NPC group had significant depletion of plasma Mg (p = 0.004) and Ni (p < 0.001) and rise of plasma Cu (p < 0.001) compared to males in the healthy control group (Supplementary Fig. S1A, S1K and S1E), but no differences of these elements were found in the female population. In contrast, a significant reduction in levels of plasma Tl (p = 0.004) and Pb (p < 0.001) were found in the female NPC group compared to the female healthy control group (Supplementary Fig. S1U and S1V). Corresponding differences were not significant in the male population. Regarding patient age, 50 was chosen as the cut-off for comparison because the incidence of NPC typically peaks between 50 and 60 years in endemic areas32. Differences in plasma Mn, Co, Sr, Mo, Sb, Ba, Tl and Pb (Supplementary Fig. S2I-J, S2L, S2O and S2S-V), between healthy control and NPC patients in both age groups were observed (p < 0.0167). Particularly, the levels of plasma Mg (p < 0.001), P (p = 0.003), Zn (p = 0.003) and Sn (p = 0.006) were only significantly reduced in NPC patients below-50 age group (Supplementary Fig. S2A-2B, S2E and S2R). In contrast, the levels of plasma Ni (p = 0.015) were decreased only in NPC patients in the above or equal to 50 age group (Supplementary Fig. S2K).
Inter-correlation of trace metals in plasma
Metallomic alterations in different cancer cases can be reflected in correlations between elements, as these trace metals can interact jointly in biological processes9. Positive correlation between elements may result from their involvement in normal or mutated biological mechanisms or exposure to environmental contaminants21. In contrast, negative correlation may imply feedback regulation of the elements. While the correlation of elements does not directly imply causality between element profiles and cancer, the observed correlations can serve as factors worth investigating in future research. In the present study, correlation analysis of the elements in the plasma of healthy control and NPC groups are shown in Fig. 3 and Supplementary Tables S4A and S4B. Some correlation patterns are similar in the two groups: Ag did not have any significant correlation with other elements; Fe did not have a significant correlation with Mg, Al, P, V and Cr; More positive correlations can be observed compared to negative correlations in both groups. Some disparities can also be discovered. In the healthy control group, more significant positive correlations were found between the trace elements. The correlation coefficient (r) of many metal pairs had exceeded 0.70, for instance, Mg-P (r = 0.90, p < 0.001), Mn-Zn (r = 0.86, p < 0.001), Cu-Zn (r = 0.84, p < 0.001), Cu-P (r = 0.81, p < 0.001) and Zn-P (r = 0.79, p < 0.001). This result suggested that there was a strong correlation between those elements. Some toxic metals (As and Sn) had no significant correlation with nearly all trace element, while Cd, Sb and Pb were correlated with certain trace element, such as Mn, Zn, Cu and Fe.
Correlation analysis of elements in the healthy control and NPC group. Asterisk indicates statistical significance: *p \(\:<\) 0.05; **p \(\:\le\:\) 0.01; ***p \(\:\le\:\) 0.001.
In contrast to the healthy control group, more significantly correlated element pairs can be found in the NPC group. Hence, more positive correlations were found between the trace elements (V, Cr, Mn and Co) and toxic metals, for instance, Cu-Ni (r = 0.99, p < 0.001), Ba-Sb (r = 0.96, p < 0.001), V-Tl (r = 0.82, p < 0.001), Cr-Tl (r = 0.80, p < 0.001) and Cr-V (r = 0.75, p < 0.001). As many heavy metals (As, Sb, Ba and Tl) were positively correlated with each other significantly, this suggests that concentrations of these elements tend to increase or decrease simultaneously.
Prediction performance of machine learning models
Given the observed differences in trace elements and metal concentrations between various sex and age groups, and the different correlations of elements across control and cancer groups, there is a compelling need for generalized techniques, such as ML, to reveal features and make accurate predictions by accounting for these demographic variations. In the present study, we utilised ML to predict the occurrence of NPC using 14 elements for which detection rates were higher than 90%, namely, Mg, Al, P, V, Cr, Ni, Cu, Zn, Fe, Sr, As, Se, Sb and Tl (Supplementary Table S3). As shown in Fig. 4 and supplementary Table S5, average AUC was 0.90–1.00; accuracy, 0.84–0.99; sensitivity, 0.79–1.00; specificity, 0.80–1.00; F1 score, 0.88–1.00; and precision, 0.94–1.00. By comparing the prediction performance of the models, the average AUC of the bagging model resulted in the highest (0.999), with the best accuracy (0.992) and precision (1.00) among all the tested models. Moreover, the high sensitivity (0.989) and specificity (1.00) indicate this model can accurately differentiate between the healthy control and NPC groups. Thus, bagging was chosen as the model for the prediction and classification of NPC.
(A) Performance of the six ML models in the test sets. (B) Comparison of the AUC of the six ML models in the training, test and external validation sets.
Next, an external validation set comprising ESCC and non-cancer patients was created to validate whether the ML model could distinguish NPC patients from patients with other similar types of cancer when their elemental profiles were similar. In this set, 15 new blood plasma samples from NPC patients—seven female and eight male—were recruited for validation. Among them, three patients were diagnosed with early-stage disease (Stage I & II), while the remaining twelve had advanced-stage disease (Stage III & IV). The elemental profiles of plasma from ESCC patients, non-cancer individuals, and NPC patients are presented in Supplementary Table S6. These samples were excluded from the training and testing phases until the external validation trial was conducted. Notably, the bagging model achieved a mean AUC of 0.857 and a sensitivity of 1.00, demonstrating its ability to accurately distinguish NPC patients (Fig. 4). The specificity reached 0.800, indicating that the model can identify NPC based on distinct elemental patterns, even in the presence of similar sample types such as ESCC.
Early diagnosis of NPC using the bagging model
Cancer is challenging to diagnose in its early stages due to its asymptomatic nature. Furthermore, tumor tissue has the potential to mimic the conditions of benign tissue33. As shown in Fig. 2, the elemental profiles of patients with early-stage cancer are more similar to the profiles of healthy controls than of patients with advanced-stage cancer. Thus, it is difficult to differentiate early-stage cancer patients from the population solely based on the elemental profile. Given the good performance of the bagging model in discriminating healthy control and NPC group, we tested its ability to recognize early-stage NPC patients. 48 subjects were recruited for this model, of which 30 were healthy control and the other 18 were NPC patients in stages I or II. The data was split into a training data set consisting of 32 individuals and a test data set consisting of 16 individuals (i.e., 7:3 ratio). Remarkably, optimal performance was achieved in differentiating early-stage NPC patients and healthy control, with all the evaluation metrics close to 1.00 as shown in Table 1. Clearly, the bagging model can recognize early-stage NPC patients.
Interpretation of the bagging meta-model and contribution of variables
When we use elements as predictors in NPC, including early-stage, it is important to understand how they affect the model performance. For this purpose, we used SHAP analysis of the bagging meta-model to evaluate the contributions of various input features. The absolute SHAP value indicates the influence of each element on the final prediction; comparing SHAP values reveals the relative importance of each element, the SHAP values also indicate the extent to which a model relies on the interplay between different features - in this case, the elements.
The absolute SHAP value of each metal is shown in Fig. 5. The top 5 contributing metals in NPC prediction and early-stage classification are ranked as follows: Sb > Mg > Ni > Tl > Al and Sb > Sr > Tl > Zn > Ni. It is observed that all these elements except Al differed significantly in the NPC group compared to the healthy control group (Fig. 1). Among the 14 elements, Sb was ranked first for both healthy control and NPC patients, with the absolute SHAP values 6.4 times higher than the second element in both NPC and early-stage predictions. This indicates Sb had an important impact on the model. The SHAP analysis results indicate that Sb, Ni and Tl are the key metals used by the model to identify NPC patients.
Average impact of each variable on the bagging model and output magnitude in (A) NPC prediction and (B) Early-stage classification.
Discussion
We performed comprehensive metallomics profiling on blood plasma of NPC patients and identified 14 elements with notable alterations in blood plasma concentrations. This provides insight into how elements change with regard to cancer stages, age and sex, and elemental biomarkers. In particular, we found that these biomarkers may be able to identify people with early-stage NPC. In fact, many potential confounders can influence the measured metal concentration, including but not limited to dietary habits, environmental exposures and health status. According to SHAP analysis, Sb, Tl and Ni showed significant contributions to the prediction of NPC. Sb is categorised as a toxic element that is carcinogenic to humans if humans are exposed to excess amounts of it34. Although there is a lack of study about Sb and its correlation with NPC, several epidemiological studies have suggested that occupational or environmental exposure to Sb might increase the risk of lung cancer35. Tl is a heavy metal that may accumulate in the environment at high concentrations as a pollutant from industrial activities; however, there are no human or animal studies showing its carcinogenicity in humans. Further research can be conducted to reveal the association between Tl and NPC. Some ecological studies have found higher levels of Ni in rice and drinking water in regions with high NPC incidence compared with low-incidence areas36. The studies further hypothesized that long-term exposure to Ni with other trace elements caused potential genetic damage, leading to modifications in nasopharyngeal epithelial cells, facilitating infection by EBV and contributing to the development of NPC37. Taken together, these findings suggest that the relations of Sb, Tl and Ni with NPC are noteworthy, and these elements are important in model discrimination.
In this study, multiple elements showed significant variations in metallomics analysis, suggesting their roles and associations with NPC. A study in Malaysia found that Sn was present in certain occupational environments and in cigarette smoke, and that these factors were highly associated with the occurrence of NPC in the Malaysian Chinese population38. It is also reported that exposure to excess Sn may affect the homeostasis of some essential elements, particularly the reduction of Fe and Cu.39 Indeed, depletion of essential elements, such as Mg, Fe, Zn and Mo, has been observed in the NPC group compared to the healthy control group. Mg primarily functions as an intracellular ion and serves as a metallo-coenzyme in more than 300 reactions involving the transfer of phosphate40. Yu et al. Found evidence of reduced serum Mg levels in sporadic NPC cases and NPC cases with NIPAL1-containing variants, where NIPAL1 is an Mg transport gene involved in Mg channelling41. Specifically, lower levels of circulating Mg possibly imply the presence of gene variants and their inability to activate T-cell responses to EBV infection. These findings suggest that the depletion of Mg is associated with NPC. Under normal conditions, given adequate supplies, Zn (together with Cu and Mn) bind with proteins, which then bind with superoxide dismutase, giving the enzyme antioxidant properties (i.e., enabling it to prevent free radical formation and helping protect cells from damage)42,43. Since these elements contribute to natural defences against oxidation, insufficient levels can compromise antioxidant activity, leading to increased oxidative stress, which is highly associated with cancer. Fe primarily function to oxygen in haemoglobin. Consistent with our findings, a study found that serum Fe levels were negatively correlated with the primary tumour burden of NPC and that lower levels of serum Fe and Fe-related metabolites were observed in NPC patients compared to healthy controls, showing that circulating Fe may be taken up by cancer cells for NPC initiation, progression and metastasis. Mo is a cofactor of xanthine oxidase which can help prevent the formation of N-nitro compounds, thereby reducing their carcinogenicity. Reduced Mo may hinder the detoxification of carcinogens and is associated with NPC. Thus, significant Fe, Zn, Cu and Mo alterations in advanced-stage NPC suggest that these metals may be involved in tumour progression. Altogether, the statistical differences observed in metallomics analysis, coupled with the existing literature that supports the role and association of multiple elements with NPC, illuminates the potential of metallomics data as reliable biomarkers and predictors in machine learning.
Imaging and molecular biotechnological examination of specific biomarkers are routine diagnostic methods that greatly contribute to cancer detection and confirmation. EBV DNA is currently the most valuable biomarker for NPC because NPC is closely associated with EBV infection. However, in several studies, about 15% of NPC patients had undetectable EBV DNA copies during initial screening. Because of this reliance on, yet unreliability of, EBV DNA as a screening biomarker, more than 50% of EBV DNA-negative patients were later diagnosed with locoregionally advanced stage III to IVA NPC44,45. Similarly, 34.4% of NPC patients (32 out of 93) in our study had no detectable EBV DNA copies in plasma, and 34.4% of them (11 out of 32) were in stages I and II, while the rest were in stages III to IV (Supplementary Table S1). Unlike the plasma elements investigating here, most EBV DNA are nucleic acids released from tumour into body circulation. The nucleic acids concentration may vary with tumour size. This evidence implies that, if EBV DNA detection is used solely in the population screening, a significant portion of individuals with NPC will be missed due to the low or undetectable levels in early-stage disease. Failing to find NPC in the early stages means diagnosis is delayed, treatment is delayed, prognosis is worse, and survival rate is inevitably lower44.
Homeostasis of essential or trace elements is carefully controlled for normal biological functioning in the human body. Any alteration in the elemental concentrations can help to reflect disease or change in health. In recent years, numerous studies have discovered the associations between various elements in different biological samples and cancers, such as prostate, lung, breast, etc9,46,47. The studies provide convincing results demonstrating that the metallome data can be a biomarker applicable to cancer diagnosis, regardless of the cancer type or tumour burden. In this pilot study, alterations of several elements in NPC plasma in both early-stage and advance-stage disease were discernible compared to healthy control plasma samples. Six different ML models showed that plasma elemental content was able to classify NPC patients (AUC of 0.90–1.00). The bagging model performed particularly well in identifying patients with early-stage disease based on the selected elements (AUC = 0.998). It was also found that the prediction results for early-stage cancer prediction are slightly better than those for NPC classification. A possible reason is that the dataset used for early-stage cancer prediction is smaller than that for control and overall cancer classification due to limited sample size. Consequently, the model might perform better on simpler data by balancing the bias and variance. These results provide strong evidence that the element Mg, Al, P, V, Cr, Ni, Cu, Zn, Fe, Sr, As, Se, Sb and Tl could be used as potential candidates of plasma biomarker panels for the early diagnosis of NPC, facilitating prompt diagnosis in high-risk populations. By integrating trace elements and metals into the modelling process, clinicians can better stratify patients for further testing and intervention using robust and reliable diagnostic tools tailored to the unique biochemical profiles of diverse patient populations. However, a limitation of the study is that the samples, namely the healthy control group and the NPC group, were collected from different hospitals. Still, further research with a larger sample size from the same hospital is warranted to validate the effectiveness of the elemental biomarkers proposed for NPC diagnosis, and to verify the repeatability of the machine learning model in cancer prediction.
Compared to the traditional molecular biotechnology examination techniques such as real-time quantitative PCR, immunohistochemistry and Western blot, extracting elements does not require critical storage conditions or the use of antibodies. This helps to minimise the loss of target analytes and shorten the analysis time. This means that cancer screening is faster and more economical and efficient as well as more accurate and sensitive. Due to the simplicity, non-invasiveness, and low cost of element extraction from blood plasma, it is easier to incorporate the determination of metals into routine physical examination, making it likely to be readily accepted by the general public. With the implementation of ML modelling, complex clinical data and biomarker information can be interpreted and analysed to predict the risk of NPC efficiently and efficaciously. This pilot study, along with providing evidence of elemental changes in healthy control and NPC plasma, showed that metallomics profiling combined with a bagging model was able to identify early-stage NPC patients with undetectable EBV DNA copies. This is noteworthy, as the inability to identify such patients is a severe failure of the present biomarker system. It is believed that complementary application of this metallomics approach with plasma EBV DNA detection for initial screening can improve the accuracy and specificity in early detection of NPC, hence allowing early medical intervention or treatment to prevent the cancer from progressing, leading to better treatment outcomes.
Conclusion
The early diagnosis of NPC is challenging due to the asymptomatic nature of the cancer. The common method of screening, EBV DNA detection, misses a significant number of early-stage cases. In this pilot study, we evaluated metallomics profiling as a diagnostic tool for NPC, comparing the levels of 14 elements in healthy and NPC individuals. In the NPC group, blood plasma concentration of Sn increased, whereas blood plasma concentrations of Mg, P, Mn, Co, Ni, Zn, Fe, Sr, Mo, Sb, Ba, Tl and Pb decreased compared to the healthy control group. Six ML models were evaluated, of which the bagging model outperformed the others with AUC and accuracy of 0.999 and 0.992, respectively. Application of SHAP revealed that Sb, Tl and Ni contributed the most to cancer prediction. These research findings demonstrate that metallomics profiling complemented with the ML bagging model is a potential valuable candidate for a clinical method of screening for or diagnosing NPC, even at its earliest stages. Early diagnosis means early treatment, and it can improve survival through timely medical support.
Data availability
The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.
References
Ferlay, J. et al. F. Global Cancer Observatory: Cancer Today (International Agency for Research on Cancer, 2022). https://gco.iarc.fr/today
Bray, F. et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: Cancer J. Clin. 74, 229–263. https://doi.org/10.3322/caac.21834 (2024).
Jicman Stan, D. et al. Nasopharyngeal carcinoma: A new synthesis of literature data (Review). Exp. Ther. Med. https://doi.org/10.3892/etm.2021.11059 (2022).
Lee, H. M., Okuda, K. S., Gonzalez, F. E. & Patel, V. Current perspectives on nasopharyngeal carcinoma. Adv. Exp. Med. Biol. 1164, 11–34. https://doi.org/10.1007/978-3-030-22254-3_2 (2019).
Juarez-Vignon Whaley, J. J. et al. Early stage and locally advanced nasopharyngeal carcinoma treatment from present to future: Where are we and where are we going? Curr. Treat. Options Oncol. 24, 845–866. https://doi.org/10.1007/s11864-023-01083-2 (2023).
Tang, X. R. et al. Development and validation of a gene expression-based signature to predict distant metastasis in locoregionally advanced nasopharyngeal carcinoma: A retrospective, multicentre, cohort study. Lancet Oncol. 19, 382–393. https://doi.org/10.1016/S1470-2045(18)30080-9 (2018).
Au, K. H. et al. Treatment outcomes of nasopharyngeal carcinoma in modern era after intensity modulated radiotherapy (IMRT) in Hong kong: A report of 3328 patients (HKNPCSG 1301 study). Oral Oncol. 77, 16–21. https://doi.org/10.1016/j.oraloncology.2017.12.004 (2018).
Chen, Y. P. et al. Nasopharyngeal carcinoma. Lancet 394, 64–80. https://doi.org/10.1016/S0140-6736(19)30956-0 (2019).
Zhang, Y., He, J., Jin, J. & Ren, C. Recent advances in the application of metallomics in diagnosis and prognosis of human cancer. Metallomics 14, mfac037. https://doi.org/10.1093/mtomcs/mfac037 (2022).
Callejon-Leblic, B., Arias-Borrego, A., Pereira-Vega, A., Gomez-Ariza, J. L. & Garcia-Barrera, T. The metallome of lung cancer and its potential use as biomarker. Int. J. Mol. Sci. 20, 778. https://doi.org/10.3390/ijms20030778 (2019).
Man, C. K., Zheng, Y. H. & Mak, P. K. Trace element profiles in the hair of nasopharyneal carcinoma (NPC) patients. J. Radioanal Nucl. Chem. 212, 151–160. https://doi.org/10.1007/BF02162347 (1996).
Khlifi, R. et al. Arsenic, cadmium, chromium and nickel in cancerous and healthy tissues from patients with head and neck cancer. Sci. Total Environ. 452–453, 58–67. https://doi.org/10.1016/j.scitotenv.2013.02.050 (2013).
Khlifi, R. et al. Risk of laryngeal and nasopharyngeal cancer associated with arsenic and cadmium in the Tunisian population. Environ. Sci. Pollut Res. Int. 21, 2032–2042. https://doi.org/10.1007/s11356-013-2105-z (2014).
Ge, X. Y. et al. Associations between serum trace elements and the risk of nasopharyngeal carcinoma: A multi-center case-control study in Guangdong Province, Southern China. Front. Nutr. 10, 1142861. https://doi.org/10.3389/fnut.2023.1142861 (2023).
Iqbal, M. J. et al. Clinical applications of artificial intelligence and machine learning in cancer diagnosis: Looking into the future. Cancer Cell. Int. 21, 270. https://doi.org/10.1186/s12935-021-01981-1 (2021).
Shravya, C., Pravalika, K. & Subhani, S. Prediction of breast cancer using supervised machine learning techniques. Int. J. Innov. Technol. Explor. Eng. 8, 1106–1110. https://doi.org/10.31661/jbpe.v0i0.2109-1403 (2019).
Yue, W., Wang, Z., Chen, H., Payne, A. & Liu, X. Machine learning with applications in breast cancer diagnosis and prognosis. Designs 2, 13–29. https://doi.org/10.3390/designs2020013 (2018).
Mohammed, M. A. et al. Trainable model for segmenting and identifying nasopharyngeal carcinoma. Comput. Electr. Eng. 71, 372–387. https://doi.org/10.1016/j.compeleceng.2018.07.044 (2018).
Chuang, W. Y. et al. Successful identification of nasopharyngeal carcinoma in nasopharyngeal biopsies using deep learning. Cancers 12, 507 (2020). https://doi.org/10.3390/cancers12020507
Tan, C., Chen, H. & Xia, C. Early prediction of lung cancer based on the combination of trace element analysis in urine and an adaboost algorithm. J. Pharm. Biomed. Anal. 49, 746–752. https://doi.org/10.1016/j.jpba.2008.12.010 (2009).
Chen, Z. et al. Machine learning-aided metallomic profiling in serum and urine of thyroid cancer patients and its environmental implications. Sci. Total Environ. 895, 165100. https://doi.org/10.1016/j.scitotenv.2023.165100 (2023).
Wang, W. et al. Identification of two-dimensional copper signatures in human blood for bladder cancer with machine learning. Chem. Sci. 13, 1648–1656. https://doi.org/10.1039/d1sc06156a (2022).
Alabi, R. O., Elmusrati, M., Leivo, I., Almangush, A. & Makitie, A. A. Machine learning explainability in nasopharyngeal cancer survival using LIME and SHAP. Sci. Rep. 13, 8984. https://doi.org/10.1038/s41598-023-35795-0 (2023).
Chen, X. et al. An interpretable machine learning prognostic system for locoregionally advanced nasopharyngeal carcinoma based on tumor burden features. Oral Oncol. 118, 105335. https://doi.org/10.1016/j.oraloncology.2021.105335 (2021).
Lundberg, S. M. & Lee, S. I. A Unified approach to interpreting model predictions. In Proceedings of the 31st International Conference on Neural Information Processing Systems 4768–4777 (2017).
Center, M. M. P. Serum & Plasma Preparation Infomration (2018). https://med.uc.edu/docs/default-source/mmpc-docs/serum_plasma-preparation.pdf?sfvrsn=bbdec461_2
Lu, Y. et al. Alkali Dilution of blood samples for high throughput ICP-MS analysis-comparison with acid digestion. Clin. Biochem. 48, 140–147. https://doi.org/10.1016/j.clinbiochem.2014.12.003 (2015).
Yedomon, B. et al. Biomonitoring of 29 trace elements in whole blood from inhabitants of Cotonou (Benin) by ICP-MS. J. Trace Elem. Med. Biol. 43, 38–45. https://doi.org/10.1016/j.jtemb.2016.11.004 (2017).
Konz, T. et al. ICP-MS/MS-Based ionomics: A validated methodology to investigate the biological variability of the human ionome. J. Proteome Res. 16, 2080–2090. https://doi.org/10.1021/acs.jproteome.7b00055 (2017).
Gonzalez-Antuna, A. et al. Simultaneous quantification of 49 elements associated to e-waste in human blood by ICP-MS for routine analysis. MethodsX 4, 328–334. https://doi.org/10.1016/j.mex.2017.10.001 (2017).
Corbacioglu, S. K. & Aksel, G. Receiver operating characteristic curve analysis in diagnostic accuracy studies: A guide to interpreting the area under the curve value. Turk. J. Emerg. Med. 23, 195–198. https://doi.org/10.4103/tjem.tjem_182_23 (2023).
Li, P. et al. Prognostic analysis of early-onset and late-onset nasopharyngeal carcinoma: A retrospective study. Discov Oncol. 15, 687. https://doi.org/10.1007/s12672-024-01594-w (2024).
Warnakulasuriya, S. & Kerr, A. R. Oral cancer screening: Past, Present, and future. J. Dent. Res. 100, 1313–1320. https://doi.org/10.1177/00220345211014795 (2021).
Mulware, S. J. Trace elements and carcinogenicity: A subject in review. 3 Biotech. 3, 85–96. https://doi.org/10.1007/s13205-012-0072-6 (2013).
Saerens, A., Ghosh, M., Verdonck, J. & Godderis, L. Risk of cancer for workers exposed to antimony compounds: A systematic review. Int. J. Environ. Res. Public. Health. 16, 4474. https://doi.org/10.3390/ijerph16224474 (2019).
Chang, E. T. & Adami, H. O. The enigmatic epidemiology of nasopharyngeal carcinoma. Cancer Epidemiol. Biomarkers Prev. 15, 1765–1777. https://doi.org/10.1158/1055-9965.EPI-06-0353 (2006).
Wang, X., Li, C. & Li, Y. F. in Applied Metallomics 349–362 (2024).
Armstrong, R. W. et al. Nasopharyngeal carinoma in Malaysian Chinese occupational exposures to particles, formaldehydes and heat. Int. J. Epidemiol. 29, 991–998. https://doi.org/10.1093/ije/29.6.991 (2000).
Roney, N., Abadin, H. G., Fowler, B. & Pohl, H. R. In Metal Ions in Toxicology: Effects, interactions, interdependencies Vol. 143–156, 143–155 (DE GRUYTER, 2015).
Baker, S. B. & Worthlet, L. I. G. The essentials of Calcium, magnesium and phosphate metabolism: Part I. Physiology. Basic. Sci. Rev. 4, 301–306 (2002).
Yu, G. et al. Whole-Exome sequencing of nasopharyngeal carcinoma families reveals novel variants potentially involved in nasopharyngeal carcinoma. Sci. Rep. 9, 9916–9910. https://doi.org/10.1038/s41598-019-46137-4 (2019).
Li, S. et al. Disrupting SOD1 activity inhibits cell growth and enhances lipid accumulation in nasopharyngeal carcinoma. Cell. Commun. Signal. 16, 28. https://doi.org/10.1186/s12964-018-0240-3 (2018).
Mabdavi, R., Faramarzi, E., Mobammad-Zadeb, M., Ghaeammagbami, J. & Jabbari, M. V. Consequences of radiotherapy on nutritional status, dietary intake, serum zinc and copper levels in patients with Gastrointestinal tract and head and neck cancer. Saudi Med. J. 28, 435–440 (2007).
Nicholls, J. M. et al. Negative plasma Epstein-Barr virus DNA nasopharyngeal carcinoma in an endemic region and its influence on liquid biopsy screening programmes. Br. J. Cancer. 121, 690–698. https://doi.org/10.1038/s41416-019-0575-6 (2019).
Lee, A. W. M. et al. A systematic review and recommendations on the use of plasma EBV DNA for nasopharyngeal carcinoma. Eur. J. Cancer. 153, 109–122. https://doi.org/10.1016/j.ejca.2021.05.022 (2021).
Zablocka-Slowinska, K. et al. Serum and whole blood Zn, Cu and Mn profiles and their relation to redox status in lung cancer patients. J. Trace Elem. Med. Biol. 45, 78–84. https://doi.org/10.1016/j.jtemb.2017.09.024 (2018).
Zhang, X. & Yang, Q. Association between serum copper levels and lung cancer risk: A meta-analysis. J. Int. Med. Res. 46, 4863–4873. https://doi.org/10.1177/0300060518798507 (2018).
Funding
Kelvin S.-Y. Leung thanks the Hong Kong Research Grants Council (HKBU 12302821 and 12303122) for their financial support. Vivian Y.-Y. Chong is supported by a postgraduate studentship offered by the University Grants Committee.
Author information
Authors and Affiliations
Contributions
V.Y.Y.C., J.T.S.L. and K.S.Y.L. contributed to the study design and interpretation of the data. V.Y.Y.C. contributed to the drafting of the manuscript. Y.N.C. and K.H.C. contributed to the machine learning modelling. H.L.L. contributed to the provision of resources. H.G., J.T.S.L. and K.S.Y.L. contributed to the revision and edition of the manuscript. K.S.Y.L. provided supervision in the study. All authors reviewed and approved the submitted manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Chong, V.YY., Chan, YN., Lum, J.TS. et al. Early diagnosis of nasopharyngeal carcinoma based on machine learning modelling and blood plasma metallomics analysis. Sci Rep 16, 3646 (2026). https://doi.org/10.1038/s41598-025-33760-7
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-33760-7




