Abstract
In this study, we conducted a targeted quantitative metabolomic analysis of 630 metabolites in the plasma of 78 West African patients at the time of breast cancer diagnosis and prior to any treatment. Most of these patients were at an advanced stage of the disease. The data were compared with those of 79 healthy controls using a combination of several machine learning approaches and statistical analyses. The predictive models obtained with the machine learning algorithms were comparable, with the best AUC of 0.878 obtained with ridge logistic regression using Boruta feature selection. The most consistently identified discriminating metabolites across univariate analyses with Benjamini-Hochberg correction, OPLS-DA analyses, and the best machine learning approach were thirteen, out of a total of 63 discriminating metabolites identified cumulatively by the three approaches. This signature highlights several key biological processes, including oxidative stress, disrupted neurotransmitter profiles, altered nitric oxide and xanthine oxidase metabolism, and impaired energy metabolism. The involvement of new metabolites significantly deregulated in breast cancer, such as asymmetric dimethylarginine and hexosylceramides, have also been identified. The identified metabolomic signature provides a comprehensive and global view of the blood biochemical phenotype associated with advanced breast cancer at the time of diagnosis.
Similar content being viewed by others
Introduction
Globally, Breast cancer is the most common cancer in women and the leading cause of cancer mortality in women in almost all countries, according to the World Health Organization1. 2.3 million women were diagnosed with breast cancer in 2020, with approximately 685,000 deaths from the disease in the same year1. The global mortality rate of breast cancer is 18 per 100,000, with a range of 8 to 20 per 100,000 across countries2. There has been a decline in mortality from breast cancer over the past three decades in the most industrialized countries, except for East Asia. In Western Europe, mortality rates reached their highest point between 1985 and 1990 and have since declined by an average of 1–3% annually3. This consistent downward trend is likely attributable to the development of early diagnosis and improved treatment.
In contrast to the declining incidence rates of breast cancer in industrialized countries, the number of cases is rising in low-income countries. The World Health Organization and the International Agency for Research on Cancer (IARC) project that by 2040, there will be 3.06 million new cases of breast cancer per year, primarily in low-income countries4. There has been an 88% increase in incidence rates in countries undergoing transition, such as those in Africa, specifically Central Africa, with an incidence of 55.9 per 100,000 women5. There also has been a notable increase in the incidence of breast cancer in sub-Saharan Africa, with significant growth of over 5% per year in Malawi, Nigeria, and Seychelles, and 3–4% per year in South Africa and Zimbabwe between the mid-1990s and mid-2010s5.
The latest data from the Global Cancer Observatory (GLOBOCAN) for 2020 shows that Africa reported 186,598 cases of breast cancer, resulting in 85,787 deaths6. In their literature review and meta-analysis, Adeloye et al. reported an incidence rate of 22.4 per 100,000 women in sub-Saharan Africa, which is comparable to the rate of 24.0 per 100,000 women in North Africa7. The current cumulative incidence of breast cancer per 100,000 women in Southern (46.2), West (37.3), East (29.9), and Central (27.9) Africa is associated with estimated mortality rates of 15.6, 17.8, 15.4, and 15.8, respectively, in these countries8. Mortality rates in sub-Saharan Africa are currently among the highest in the world. The relative age-standardized survival rate at five years in 12 sub-Saharan African countries was 66% for cases diagnosed between 2008 and 2015. This is in stark contrast to the 85–90% survival rate for cases diagnosed in developed countries between 2010 and 20145,9. The low survival rates in sub-Saharan Africa are primarily attributable to late detection, which results in the diagnosis of advanced cancer stages. A report summarizing 83 studies conducted in 17 sub-Saharan African countries revealed that 77% of all classified cases were stage III or IV at the time of diagnosis10. A study conducted in five sub-Saharan African countries revealed that delayed management is a significant factor contributing to the high mortality rates in the region. This study estimated that 28–37% of breast cancer deaths in these countries could be prevented by early diagnosis and appropriate treatment11.
Mali is experiencing a notable rise in breast cancer cases, aligning with the broader global trends observed in Africa. The country maintains a cancer registry established in 1986, which primarily covers the Bamako district and serves as a valuable resource for cancer epidemiology. A study conducted by Mawadzoue in 2011 revealed a 20% increase in breast cancer incidence between 1988 and 200912. The study also indicated that breast cancer became the second most common cancer among women during this time. This increase was particularly notable among women under 55, with a 30% rise in risk, while incidence among post-menopausal women remained stable. These epidemiological trends have persisted, with breast cancer accounting for 21.1% of all cancers in 2018, and now ranking as the second most prevalent cancer among women in Mali, representing 23.7% of all cancers diagnosed in women, following cervical cancer at 25.4%5,13.
Clinical metabolomics is a process that involves measuring as many metabolites as possible in samples from patients affected by a disease. This is done to identify metabolic signatures that can be used to predict the disease status. The objectives are to identify the deep biochemical phenotype of the disease, gain a deeper comprehension of the disease’s pathophysiology, identify novel biomarkers for early detection, prognosis, classification, or response to treatment, and ascertain new therapeutic avenues. The published literature on breast cancer metabolomics is complex and presents a challenge for analysis. This is due to several factors, including the number of patients studied, the clinical criteria for patient inclusion, breast cancer subtypes, the types of samples analyzed, and the metabolomic assays used, which vary considerably from one study to another. As a result, it is challenging to gain a comprehensive understanding of the specific metabolic profiles associated with breast cancer. A review of the literature published in 2020 by Yang et al.14 identified a total of 39 articles, 22 of which were conducted on blood samples, with the remainder on urine, cancer tissue, saliva, and ductal fluid. The data analysis revealed that the metabolic pathways most frequently disrupted included aminoacyl-tRNA biosynthesis, glycerophospholipid metabolism, glycolysis and gluconeogenesis, alanine, aspartate, glutamate, glycine, serine, and threonine metabolism. Subramani’s review, published in 2022, confirmed that the metabolic pathways most often disrupted in breast cancer were carbohydrate, lipid, and amino acid metabolism15.
To the best of our knowledge, no clinical metabolomics studies have been conducted in African populations with breast cancer to date. We conducted a targeted quantitative metabolomics study, measuring 630 metabolites in the plasma of newly diagnosed West African women, the majority of whom were at an advanced stage of the disease prior to any treatment. The objective was to ascertain the blood biochemical phenotype that typifies breast cancer at the time of diagnosis.
Materials and methods
Ethical approvals
This study was approved by the Ethics Committee of the University of Science, Techniques, and Technologies (USTTB) of Bamako, within the Faculty of Medicine and Pharmacy, under the reference number 2021/236/USTTB. Prior to their participation, all individuals involved were provided with a comprehensive information document outlining the study’s objectives. After obtaining written informed consent from all case and control women, they were provided with a questionnaire to complete during an interview. The questionnaire included a range of variables covering sociodemographic information, cancer history, parity, body mass index, duration of breastfeeding, place of residence, menopausal status, family history of cancer, past abortions, chronic diseases (such as diabetes mellitus, high blood pressure, etc.), and TNM classification of cases.
The study participants were Malian women who took part in a prospective case-control study conducted from August 2021 to March 2022. The cases of breast cancer were recruited following histological confirmation in the oncology departments of two University Hospitals simultaneously: the Luxembourg “Mère-Enfant” and the “Point G” Hospitals. A total of 80 women with breast cancer and 80 women without breast cancer, aged 18 or above, were enrolled in the study.
Inclusion criteria
The study included all patients who had been residing in Mali for at least one year prior to inclusion and who had been newly diagnosed with breast cancer in the two departments, regardless of grade, before the initiation of any chemotherapy treatment. The clinical stage of the patients was determined by the oncologists using the tumor-node-metastasis (TNM) classification. This classification categorizes breast cancer into four stages, ranging from I to IV, based on the following criteria: localized invasive breast cancer (stages I and II), inoperable locally advanced invasive breast cancer (stage III), and metastatic disease (stage IV).
Healthy control subjects were selected to match the cases on age, with a maximum difference of five years. The control group was comprised of apparently healthy women without a history of cancer, who either accompanied patients to the hospital from the Gynecology Department or visited the Laboratoire Rodolphe Mérieux of Bamako. To be included in the control group, women had to have had recent consultations or follow-ups with a gynecologist, confirming their health status and presenting normal mammography results.
Exclusion criteria
Individuals with a history of cancers other than breast cancer were excluded from the study. Furthermore, patients whose blood samples exhibited hemolysis were excluded from the analysis.
Blood sample collection
A 5 ml blood sample was collected from participants into a sodium heparin tube after an 8-hour fast. The samples were placed in a refrigerated bag containing a gel pack to maintain a temperature between 2 °C and 8 °C and were immediately sent to the laboratory (Charles-Mérieux Center for Infectiology, Bamako). Upon arrival, the samples were centrifuged for 10 min at + 4 °C at 2314 RCF (relative centrifugal force), and then aliquoted with at least 200 µL of plasma into cryovials before being stored at -80 °C. The plasma samples of three patients were excluded due to hemolysis, reducing the number of cases to 78 and those of controls to 79. To maintain anonymity, the samples were anonymized using codes that combined their origin and the order of sampling. Subsequently, the samples were transported in dry ice to the Biochemistry Laboratory of the University Hospital of Angers, France, where they were subjected to metabolomics analysis.
Metabolomic analysis
A targeted quantitative metabolomic analysis was conducted using the Biocrates Quant 500 MxP® kit (Biocrates Life Sciences AG, Innsbruck, Austria) and a QTRAP 5500 mass spectrometer (SCIEX, Villebon-sur-Yvette, France). This equipment, calibrated every three months, was used to quantify 630 polar metabolites and lipids belonging to 26 biochemical classes representative of various metabolic pathways. The sample preparation and analysis were conducted in accordance with the user manual instructions. Each plasma sample was carefully vortexed after thawing and then centrifuged at 4 °C for 5 min at 5,000 RPM. Twenty microliters of each sample were added to the filters of the upper wells of a 96-well plate. The metabolites were extracted and derivatized to quantify amino acids and biogenic amines. The extracts were subsequently diluted with MS solvent prior to FIA and LC-MS/MS analysis.
The Biocrates Quant 500 MxP® kit uses Multiple Reaction Monitoring (MRM) experiments to quantify carnitine, acylcarnitines, lipids, and sugars via Flow Injection Analysis (FIA-MS/MS). It also quantifies amino acids, biogenic amines, bile acids, carboxylic acids, cresols, fatty acids, hormones, indoles, nucleobases, and choline through Liquid Chromatography (LC) separation prior to MS quantification. For metabolites eluted by HPLC, a special column and precolumn provided with the kit were used, maintained at 50 °C for an 11-minute run with a five µL injection volume. In both positive and negative modes, the flow rate varied from 500 to 800 µL/min, with the mobile phase transitioning from 100% aqueous to 100% acetonitrile (both with 0.2% formic acid) before returning to initial conditions. The remaining metabolites were quantified using direct flow injection analysis coupled with tandem mass spectrometry (FIA-MS/MS) in both positive and negative modes. For this analysis, 20 µL of the sample were injected, and the methanolic flow rate started at 30 µL/min, increasing to 200 µL/min, for a total run time of three minutes. Specific mass spectrometry parameters are provided in Supplementary Table S1. All reagents used in the analysis, except those provided by the Biocrates Quant 500 MxP® kit, were of LC-MS grade and procured from VWR (Fontenay-sous-Bois, France) and Merck (Molsheim, France).
Data processing and statistical workflow
Data analyses were performed using the MetIDQ software, specifically developed by Biocrates for peak picking, identification, and quantification. The kit provides seven calibration points for LC quantification and a one-point calibration sample for FIA semi-quantification. Additionally, three external quality control samples, composed of human plasma (low, medium, and high), were used to assess the performance of the analytical tests. Metabolites were validated if the accuracy was within the kit-defined error margins and the coefficient of variation (CV) was below 20%.
Metabolomic data were filtered to exclude metabolites with more than 20% of missing values (due to values < LOQ, lower limit of quantification), in accordance with the 80% rules set forth16. Any missing values were then replaced using the K-nearest neighbor imputation method. The data were then subjected to a series of preprocessing steps, including calculation of the fold change for each compound, matrix standardization with mean centering, variance scaling, in accordance with the methodology proposed by van den Berg et al. (2006), and log transformation17.
Univariate analyses of all metabolites were conducted using Mann-Whitney’s U tests18 with Benjamini-Hochberg correction applied. Tests were considered statistically significant if the corrected p-value was below 0.05. To identify outliers and spontaneous clusters, we employed a multivariate unsupervised principal components analysis (PCA). A machine learning pipeline was used to analyze the datasets.
To evaluate the efficacy of various machine learning algorithms, we employed repeated nested cross-validation19. This approach allowed us to assess the performance of logistic regression with lasso and ridge penalization, random forest, and support vector machine (SVM) algorithms. The nested cross-validation framework consisted of two loops: an outer loop for model evaluation and an inner loop for hyperparameter tuning. In the outer loop, we performed 10-fold cross-validation, where the dataset was split into 10 equal parts (outer folds). Each fold was used once as the test set, while the remaining 90% of the data served as the training set. This procedure was repeated 10 times. Feature selection was conducted within the outer training folds of each outer loop, thus leaving the outer test fold untouched, before proceeding to the inner loop, using the Boruta algorithm20. The Boruta algorithm assesses the significance of each feature by comparing it to randomly generated shadow features. This process involves training a random forest classifier on an extended dataset that includes both original and shadow features, and then assessing feature importance based on the Gini impurity metric. Features that were deemed less important than the shadow features were removed. In the inner loop, additional cross-validation was performed within the training data of each outer fold. This inner loop did not use the outer test fold at any stage. In each inner cross-validation the outer training set was further divided into smaller training and validation inner folds (5-fold). These inner cross-validation splits enabled hyperparameter tuning using the inner training folds to train models with different hyperparameters and then to identify the best-performing model on the inner test fold (a strategy also known as “gridsearch”). Once the optimal features and parameters were selected, we trained the model on the entire outer training folds and evaluated its performance on the outer test fold. To ensure robustness and account for variability in data splitting, we repeated the entire 10-fold nested cross-validation process 10 times with different random splits and computed the mean AUC-ROC (Area under Receiver Operating Characteristic Curve) across all folds, ensuring that our metrics have been computed and averaged across a total of 100 test folds.
To facilitate interpretation of the machine learning algorithms, the mean SHAP (SHapley Additive exPlanations) values of each selected metabolite across each outer-fold of the 10 iterations were calculated to determine the average feature importance. SHAP values, which are based on cooperative game theory, distribute credit among features by quantifying their contribution to machine learning model predictions. By considering all possible combinations of features and their contributions, SHAP values provide a fair and easily understandable measure of the importance of each feature in influencing the model’s predictions.
Additionally, a supervised orthogonal partial least-squares discriminant analysis (OPLS-DA) was conducted as a benchmark for relevant metabolomic signatures. The statistical models were implemented using Simca-P + v 17.0 (Umetrics, Umea, Sweden), allowing optimization through variable exclusion. The selection of metabolites of interest was based on a combination of information from various plots. The S-plot (which depicts both intensity and reliability), the loading column plot with jackknife confidence intervals, the coefficient plot, and the variable importance in the projection (VIP) plot were all considered. The objective of this strategy was to minimize the risk of overfitting and to decrease prediction variability. Only metabolites with a VIP value greater than 1 were deemed relevant in the metabolomic footprint. The effectiveness and performance of the OPLS-DA models were evaluated using several key metrics, including Q2Ycum (prediction goodness), R2Ycum (fit goodness) values, CV-ANOVA (cross-validation analysis of variance), and a permutation test (assessing overfitting risk) using 7-fold cross validation. AUC, enabled comparison of model performances with machine learning approaches, was calculated using the ROPLS package with the selected metabolites, following the previously described nested cross-validation procedure21.
All data processing was conducted using SIMCA (17.0), R (4.2.2) with the ROPLS (1.28.2) package, and Python (3.9) with the Python packages. The following software were used: NumPy (1.26.3), Pandas (2.1.4), scikit-learn (1.3.0), BorutaPy (0.3), SHAP (0.41.0), seaborn (0.12.2), matplotlib_venn_wordcloud (0.11.9), and matplotlib (3.8.0).
Results
The targeted metabolomic study was conducted on the plasma samples of 78 breast cancer cases and 79 healthy controls. Table 1 present the sociodemographic and clinical data of both cases and controls.
Of the 630 metabolites analyzed in the plasma, 394 were considered accurately measured. The raw data for the 394 metabolite concentrations measured in the 157 participants, representing a total of 61,803 metabolite concentrations in µM/L, are provided in Supplementary Table S1.
The univariate analysis, conducted using the Mann-Whitney U test, revealed that 25 metabolites exhibited significantly altered concentrations in individuals with breast cancer compared to healthy controls, following the Benjamini-Hochberg correction for false discovery rate (Table 2). The results are further illustrated in a volcano plot (Supplementary Fig. 1), which provides a clearer visualization of the relative discriminating weight of the metabolites, and in a heatmap (Supplementary Fig. 2), which depicts inter-individual variability.
The unsupervised PCA analysis did not yield any spontaneous discernible differentiation between the groups. However, a supervised OPLS-DA discriminant model was successfully identified, showing a Q2Y(cum) value of 0.55, indicating a good predictive ability (> 0.5). The risk of overfitting was low, as evidenced by a cross-validation analysis of variance (CV-ANOVA) with a p-value of less than 6.67e-10 and a permutation test Perm R2 of 0.389, which is below the threshold of 0.4 (Fig. 1).
The discriminating metabolites underlying this OPLS-DA model are presented in the volcano plot depicted in Fig. 2, based on their Variable Importance in Projection (VIP) scores and loadings. A total of 46 metabolites were identified as discriminant, with 41 exhibiting reduced concentrations in breast cancer and 5 showing increased concentrations.
Volcano plot showing the contribution of each metabolite to the OPLS-DA model. Metabolites in red, with a VIP > 1 and |pcorr| > 0.2 and were considered the most discriminants. On the left are presented the metabolites whose concentration is lowered in cancer and on the right, those whose concentration is increased.
The efficacy of various machine learning algorithms, including logistic regression with lasso and ridge penalization, random forest, and support vector machine (SVM), in conjunction with distinct feature selection techniques, was assessed using AUC-ROC and benchmarked against the performance of the most optimal OPLS-DA model, as illustrated in Fig. 3. The various strategies demonstrated comparable performance, with AUC values ranging from 0.807 to 0.878. The most accurate model prediction was achieved using ridge logistic regression with Boruta feature selection, with an AUC of 0.878. The best OPLS-DA model demonstrated a median performance with an AUC of 0.840.
Supplementary Fig. 3 shows the mean AUC-ROC curve of the aforementioned best model. Supplementary Fig. 4 shows the SHAP values of the best discriminating metabolites contributing to this model. The discriminant metabolites are also presented in a volcano plot format in Fig. 4.
Volcano plot of the best discriminating SHAP values contributing to the best model obtained with the ridge logistic regression with Boruta feature selection. On the left are metabolites whose concentration is lowered in breast cancer, and on the right those whose concentration is increased in comparison to, controls.
Overall, the combination of machine learning, univariate, and multivariate analyses identified 63 metabolites with altered concentrations in the blood of patients affected by breast cancer (Fig. 5). Of the 63 metabolites identified, 13 were identified by all three approaches, 16 by two approaches only, and 34 by a single approach.
Venn diagram showing the metabolites significantly altered in the blood of patients with breast cancer by combination of multivariate (OPLS-DA, VIP > 1), machine learning (mean SHAP value > 0.01) and univariate after Benjamini-Hochberg correction (p-value < 0.05) analyses. Metabolites with increased concentration in breast cancer are presented in red, and those with decreased concentrations in blue.
Discussion
We present a blood metabolic profile of breast cancer patients at the time of diagnosis and prior to any treatment, obtained using targeted quantitative metabolomics and combined statistical and machine learning approaches. To our knowledge, this is the first metabolomic study performed in breast cancer patients from West African patients. To analyze the significance of this metabolic profile, we focused the bibliographic interpretation on the 13 most consensual discriminating variations identified by the three approaches, and we further considered other less consensual discriminating metabolites.
A notable aspect of our findings is the involvement of two neurotransmitters, gamma-aminobutyric acid (GABA) and serotonin (5-hydroxytryptamine, 5-HT), which were found to be present in reduced concentrations in the blood of breast cancer patients. These two metabolites were identified as the most discriminant in each of the three analyses. These altered concentrations are further supported by other features identified within the signature. Alpha-aminobutyric acid (AABA), an isomer of GABA, was also found to be decreased by two different analytical approaches. Tryptophan, the precursor of serotonin, was identified as one of the 13 most significant metabolites and was also found to be reduced in the blood of affected patients. Lastly, 3-indoleacetic acid (3-IAA), a breakdown product of serotonin identified solely through a machine learning approach, demonstrated elevated concentrations in breast cancer patients, indicating a potential metabolic blockade.
Serotonin is a biogenic monoamine that is primarily known for its role as a neurotransmitter, regulating mood, sleep, and appetite in the central nervous system. There is evidence that serotonin may have a stimulatory effect on cancer cell proliferation, invasion, dissemination, and tumor angiogenesis in most cancers22. The mammary gland expresses serotonin receptors that play a role in its development and maintenance22. Serotonin has been shown to trigger the growth of neoplastic breast cells through 5-HT2A receptor cellular signaling pathways that promote the survival and proliferation of cancer cells23. Human breast cancer and breast epithelial cells (MCF-10 A) have been shown to synthesize intrinsic serotonin. Furthermore, studies have revealed that serotonin expression is higher in breast cancer tissue compared to para-carcinoma tissues24. Our study underscores this disruption in tumor serotonin metabolism, which may be also associated with the mood disorders and depression that can potentially arise from a cancer diagnosis25.
GABA is a neurotransmitter that plays a role in inhibiting neuronal activity in the central nervous system. It is primarily recognized for its ability to promote relaxation, reduce anxiety, and facilitate sleep. Further research is required to determine whether GABA and its receptors are involved in cancer biology, including breast cancer. GABA receptors have been identified in breast cancer tissues, indicating the possibility of interactions between GABA signaling and cancer progression26. A reduction in GABA levels within breast tumors was demonstrated to have a significant prognostic value27.
We found no clear data regarding variations in serotonin and GABA levels in the blood of breast cancer patients. However, a case-control study of male breast cancer patients revealed a notable decline in serotonin and GABA levels in their blood, aligning with our findings28.
A second key characteristic of the most consistent part of our metabolic signature is the decreased concentration of six amino acids: tryptophan, lysine, threonine, leucine, methionine, and 3-methyl-histidine. Additionally, seven other amino acids (valine, asparagine, tyrosine, histidine, aspartate, homoarginine, and taurine) were also found to be decreased, albeit with less consistency across the different analyses. Furthermore, our analysis revealed an increased concentration of methionine sulfoxide, which was identified through machine learning approaches. The increase in methionine sulfoxide, coupled with the decrease in methionine, results in an elevated MetSO/Met ratio, which is a well-known biomarker of oxidative stress. This ratio is a particularly effective discriminating characteristic in both the machine learning model and univariate analysis.
A decline in blood amino acids was also identified in a metabolomic study that compared the serum of 112 breast cancer patients with two groups of 95 and 112 healthy controls29. This finding was corroborated by several other studies, including the work of Eniu et al., which demonstrated that the reduction in certain amino acids may be associated with breast cancer progression30. It is established that tumors can increase the uptake of amino acids to play a role in protein and nucleotide synthesis, as well as serving as energy substrates or cell signaling factors29. It is also possible that this decreased blood concentration of amino acids may be due to altered nutritional metabolism, which is a common issue among patients with advanced cancer.
Taurine and hypotaurine metabolism were identified as one of the most frequently altered metabolic pathways in breast cancer31. The taurine concentration in blood was demonstrated to be reduced in the plasma of patients with breast cancer in comparison to healthy controls, which is consistent with our findings32. Further research is needed to determine whether taurine can have a protective effect against breast cancer. For instance, research has demonstrated that taurine can impede the growth of human breast cancer cells and induce apoptosis by regulating apoptosis-related proteins in mitochondria33.
Three phosphatidylcholines (PC aa C32:3, PC ae C40:1, and PC ae C34:3) are of particular significance in terms of the metabolic signature, as illustrated in the Venn diagram. Twelve additional phosphatidylcholines are located more peripherally in the diagram. All these phosphatidylcholines, along with one lysophosphatidylcholine, are present at lower levels in the blood of breast cancer patients. This is also true of choline, an essential substrate for phosphatidylcholines, and three sphingomyelins. The broader phospholipid signature was mostly revealed by OPLS-DA. Other studies have already demonstrated a globally reduced concentration of phosphatidylcholines, lysophosphatidylcholines, and sphingomyelins in the blood of breast cancer patients compared to healthy controls34,35,36. This lipidomic remodeling may be attributed to alterations in lipoprotein metabolism and cell membrane remodeling in tumors.
One hexosylceramide (HexCer(d18:1/16:0)) seems pivotal to the metabolic signature, ranking among the 13 most pivotal discriminant metabolites, with three other hexosylceramides situated more peripherally in the diagram. All of these hexosylceramides demonstrate elevated concentrations in the blood of breast cancer patients, while four ceramides exhibit decreased levels. Ceramides are of growing importance in the field of breast cancer research and treatment37,38. These sphingolipids are present in cell membranes, where they play a crucial role in cell signaling, particularly in apoptosis. Hexosylceramides, which are associated with a neutral sugar molecule, serve as precursors within cell membranes of complex glycosphingolipids, such as gangliosides. Additionally, research has indicated that ceramides may be modified in breast tumors and blood39,40,41. However, to our knowledge, the increased concentration of hexosylceramides in patient blood has not been previously demonstrated.
Finally, among the 13 most widely-consensus metabolites, xanthine is decreased in the blood of breast cancer patients, while hypoxanthine is also detected as lowered, though only by OPLS-DA. Xanthine and hypoxanthine are intermediate metabolites in the degradation pathway of purines. They are produced through the breakdown of purine nucleotides and can be converted into uric acid by the action of the enzyme xanthine oxidase. Xanthine oxidase has been demonstrated to play a crucial role in differentiation, and its reduced expression is associated with increased aggressiveness in breast cancer42.
In addition to the 13 most consistent metabolites, other metabolites of interest are revealed by the overall signature.
Asymmetric dimethylarginine (ADMA) has been identified by both machine learning and OPLS-DA analyses as having an increased concentration in the blood of breast cancer patients, whereas univariate analysis and machine learning have indicated a decrease in homoarginine. ADMA is a metabolite derived from protein degradation. It functions as a competitive inhibitor of nitric oxide synthase (NOS), an enzyme that produces nitric oxide (NO), which plays a role in regulating various physiological processes, including vasodilation and immune function. ADMA has the potential to influence tumor progression by disrupting NO production, which may promote cell proliferation, angiogenesis, and metastasis. This could create a favorable environment for tumor growth and disease progression. Our research did not uncover any studies indicating an elevation in ADMA levels in individuals diagnosed with breast cancer. However, an increase in ADMA was observed in mice with metastatic breast cancer43.
Four triglycerides in the blood of breast cancer patients have increased concentrations, while three others were decreased. A recent study by Guma et al. (2021) has demonstrated a positive correlation between triglyceride-enriched lipoproteins and breast cancer in patient blood samples44. Furthermore, elevated plasma triglycerides have been linked to an increased risk of premenopausal breast cancer45.
Two bile acids were found in higher concentrations in the blood of breast cancer patients: glycocholic acid (GCA), identified in both univariate and machine learning analyses, and glycoursodeoxycholic acid (GUDCA), identified in machine learning. It is worth noting that GCA shows a particularly strong increase, with concentrations rising 12-fold. Three bile acids have already been demonstrated to exhibit a pronounced elevation in the plasma of breast cancer patients, including glycoursodeoxycholic acid (FC 2.62), glycochenodeoxycholic acid (FC 4.46), and tauroursodeoxycholic acid (FC 6.04)46.
Bile acids also regulate the growth of gut bacteria, while the gut microbiota is involved in the biotransformation of bile acids47. Two metabolites linked to microbiome metabolism have been found to have reduced concentrations in breast cancer patients: hippuric acid (HipAcid) and indoxyl sulfate (Ind-SO4). Hippuric acid is a metabolite resulting from the hepatic conjugation of benzoic acid with glycine or from the bacterial metabolism of phenylalanine in the intestine. A previous metabolomic study also revealed a reduction in concentration of this metabolite in the plasma of breast cancer patients compared to controls32. Following the ingestion of plant foods rich in polyphenolic compounds, benzoic acid is typically produced by intestinal microbial metabolic pathways. Hippuric acid has also been proposed as a biomarker of aging, as its levels in plasma and urine can be affected by several age-related conditions, including frailty, sarcopenia, and cognitive impairment48. Additionally, intestinal bacteria metabolize the amino acid tryptophan into indole. The subsequent oxidation and sulfonation of indole in the liver results in the formation of 3-indoxyl sulfate (3-IS), which is then excreted in the urine.
Finally, a decreased concentration of lactate was identified in the blood of breast cancer patients through machine learning analysis. Despite the Warburg effect (increased anaerobic glycolysis under aerobic conditions) observed in cancer, which might suggest an increase in tumor lactate concentration, the blood level of lactate has already been shown to be reduced (fold change 0.69) in breast cancer49.
Methodologically, this study enabled us to compare various machine learning approaches with more traditional univariate and multivariate (OPLS-DA) analyses, providing valuable insights into the relative merits of each approach. All the methodologies yielded comparable results in terms of performance, with ridge logistic regression achieving the highest AUC. While this combined approach identified a core signature of 13 metabolites that were consistently discriminating, there were discrepancies in the specific metabolites identified by each method. OPLS-DA provided the most comprehensive signature among the approaches evaluated.
Conclusion
In conclusion, at the time of diagnosis and before any anti-cancer treatment in advanced breast cancer patients from West Africa, a distinct metabolomic signature is clearly discernible in the blood. This signature demonstrates the significant metabolic impact of the tumor on a systemic level, providing a comprehensive overview of the metabolic dysregulations associated with this cancer (Table 3). The most notable features include alterations in neurotransmitters, amino acids, phospholipids, ceramides, and hexosylceramides, triglycerides, purines, bile acids, nitric oxide (specifically ADMA), and lactate metabolism, accompanied by oxidative stress. To our knowledge, this is the first study identifying a deregulation of ADMA and hexosylceramides in breast cancer.
Data availability
The original data presented in the study are included in the article and supplementary file. Further inquiries can be requested from the corresponding author.
References
World Health Organization (WHO) February. Global breast cancer initiative implementation framework: assessing, strengthening and scaling-up of services for the early detection and management of breast cancer: executive summary. (2023). https://www.who.int/publications/i/item/9789240067134. Accessed April, (2024).
Cancer Today. Edited by Ferlay J, et al. World health Organization (WHO). powered by GLOBOCAN https://publications.iarc.fr/Databases/Iarc-Cancerbases/Cancer-Today-Powered-By-GLOBOCAN-2018--2018. Accessed (April, 2024). (2018).
Ferlay, J. et al. Cancer incidence and mortality patterns in europe: estimates for 40 countries in 2012. Eur. J. Cancer. 49, 1374–1403. https://doi.org/10.1016/j.ejca.2012.12.027 (2013).
Ezeome, E. R. et al. The African female breast cancer epidemiology study protocol. Front. Oncol. 12, 856182. https://doi.org/10.3389/fonc.2022.856182 (2022).
Sung, H. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 71, 209–249. https://doi.org/10.3322/caac.21660 (2021).
Anyigba, C. A., Awandare, G. A. & Paemka, L. Breast cancer in sub-Saharan africa: the current state and uncertain future. Exp. Biol. Med. (Maywood). 246, 1377–1387. https://doi.org/10.1177/15353702211006047 (2021).
Adeloye, D. et al. Estimating the incidence of breast cancer in africa: a systematic review and meta-analysis. J. Glob Health. 8, 010419. https://doi.org/10.7189/jogh.08.010419 (2018).
Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424. https://doi.org/10.3322/caac.21492 (2018).
Allemani, C. et al. Global surveillance of trends in cancer survival 2000-14 (CONCORD-3): analysis of individual records for 37 513 025 patients diagnosed with one of 18 cancers from 322 population-based registries in 71 countries. Lancet 391, 1023–1075. https://doi.org/10.1016/S0140-6736(17)33326-3 (2018).
Jedy-Agba, E., McCormack, V., Adebamowo, C. & Dos-Santos-Silva, I. Stage at diagnosis of breast cancer in sub-Saharan africa: a systematic review and meta-analysis. Lancet Glob Health. 4, e923–e935. https://doi.org/10.1016/S2214-109X(16)30259-5 (2016).
McCormack, V. et al. Breast cancer survival and survival gap apportionment in sub-Saharan Africa (ABC-DO): a prospective cohort study. Lancet Glob Health. 8, e1203–e1212. https://doi.org/10.1016/S2214-109X(20)30261-8 (2020).
Mawadzoue, F. D. S. Cancers du sein (féminin) et du foie en Afrique de l’Ouest: évolution temporelle de l’incidence et évaluation des facteurs de risque en Gambie et au Mali. Phd thesis. Université Claude Bernard - Lyon I, (2011). https://tel.archives-ouvertes.fr/tel-01138101. Accessed April, 2024.
Cancer Today. World Health Organization. Cancer Today. WHO (2022). https://gco.iarc.who.int/today/. Accessed April, 2024.
Yang, L. et al. Application of metabolomics in the diagnosis of breast cancer: a systematic review. J. Cancer. 11, 2540–2551. https://doi.org/10.7150/jca.37604 (2020).
Subramani, R., Poudel, S., Smith, K. D., Estrada, A. & Lakshmanaswamy, R. Metabolomics of breast cancer: A review. Metabolites 12, 643. https://doi.org/10.3390/metabo12070643 (2022).
Bijlsma, S. et al. Large-scale human metabolomics studies: A strategy for data (pre-) processing and validation. Anal. Chem. 78, 567–574. https://doi.org/10.1021/ac051495j (2006).
van den Berg, R. A., Hoefsloot, H. C., Westerhuis, J. A., Smilde, A. K. & van der Werf, M. J. Centering, scaling, and transformations: improving the biological information content of metabolomics data. BMC Genomics. 7, 142. https://doi.org/10.1186/1471-2164-7-142 (2006).
Lokhov, P. G., Trifonova, O. P., Maslov, D. L., Lichtenberg, S. & Balashova, E. E. Personal metabolomics: A global challenge. Metabolites 11, 715. https://doi.org/10.3390/metabo11110715 (2021).
Zhong, Y., Chalise, P. & He, J. Nested cross-validation with ensemble feature selection and classification model for high-dimensional biological data. Commun. Stat. – Simul. Comput. 52, 110–125. https://doi.org/10.1080/03610918.2020.1850790 (2023).
Kursa, M. B. & Rudnicki, W. R. Feature selection with the Boruta package. J. Stat. Softw. 36, 1–13. https://doi.org/10.18637/jss.v036.i11 (2010).
Thevenot, E. A., Roux, A., Xu, Y., Ezan, E. & Junot, C. Analysis of the human adult urinary metabolome variations with age, body mass index and gender by implementing a comprehensive workflow for univariate and OPLS statistical analyses. J. Proteome Res. 14, 3322–3335. https://doi.org/10.1021/acs.jproteome.5b00354 (2015).
Balakrishna, P., George, S., Hatoum, H. & Mukherjee, S. Serotonin pathway in cancer. Int. J. Mol. Sci. 22, 1268. https://doi.org/10.3390/ijms22031268 (2021).
Sonier, B., Arseneault, M., Lavigne, C., Ouellette, R. J. & Vaillancourt, C. The 5-HT2A serotoninergic receptor is expressed in the MCF-7 human breast cancer cell line and reveals a mitogenic effect of serotonin. Biochem. Biophys. Res. Commun. 343, 1053–1059. https://doi.org/10.1016/j.bbrc.2006.03.080 (2006).
Xie, Q. E. et al. Identification of serotonin as a predictive marker for breast cancer patients. Int. J. Gen. Med. 14, 1939–1948. https://doi.org/10.2147/IJGM.S310591 (2021).
Perez-Tejada, J. et al. Anxiety and depression after breast cancer: the predictive role of monoamine levels. Eur. J. Oncol. Nurs. 52, 101953. https://doi.org/10.1016/j.ejon.2021.101953 (2021).
Jayachandran, P. et al. Breast cancer and neurotransmitters: emerging insights on mechanisms and therapeutic directions. Oncogene 42, 627–637. https://doi.org/10.1038/s41388-022-02584-4 (2023).
Brzozowska, A., Burdan, F., Duma, D., Solski, J. & Mazurkiewicz, M. gamma-amino Butyric acid (GABA) level as an overall survival risk factor in breast cancer. Ann. Agric. Environ. Med. 24, 435–439. https://doi.org/10.26444/aaem/75891 (2017).
Ahmed Abdelsalam, K. E. et al. A case control study on serum levels of potential biomarkers in male breast cancer patients. Int. J. Environ. Res. Public. Health. 18, 4852. https://doi.org/10.3390/ijerph18094852 (2021).
Mrowiec, K. et al. Profiling of serum metabolome of breast cancer: multi-cancer features discriminate between healthy women and patients with breast cancer. Front. Oncol. 14, 1377373. https://doi.org/10.3389/fonc.2024.1377373 (2024).
Eniu, D. T. et al. The decrease of some serum free amino acids can predict breast cancer diagnosis and progression. Scand. J. Clin. Lab. Invest. 79, 17–24. https://doi.org/10.1080/00365513.2018.1542541 (2019).
Huang, S. et al. Novel personalized pathway-based metabolomics models reveal key metabolic pathways for breast cancer diagnosis. Genome Med. 8, 34. https://doi.org/10.1186/s13073-016-0289-9 (2016).
Jové, M. et al. A plasma metabolomic signature discloses human breast cancer. Oncotarget 8, 19522–19533. https://doi.org/10.18632/oncotarget.14521 (2017).
Zhang, X. et al. Taurine induces the apoptosis of breast cancer cells by regulating apoptosis-related proteins of mitochondria. Int. J. Mol. Med. 35, 218–226. https://doi.org/10.3892/ijmm.2014.2002 (2015).
His, M. et al. Prospective analysis of Circulating metabolites and breast cancer in EPIC. BMC Med. 17, 178. https://doi.org/10.1186/s12916-019-1408-4 (2019).
Qiu, Y. et al. Mass spectrometry-based quantitative metabolomics revealed a distinct lipid profile in breast cancer patients. Int. J. Mol. Sci. 14, 8047–8061. https://doi.org/10.3390/ijms14048047 (2013).
Chen, Y. et al. Simultaneous quantification of serum monounsaturated and polyunsaturated phosphatidylcholines as potential biomarkers for diagnosing non-small cell lung cancer. Sci. Rep. 8, 7137. https://doi.org/10.1038/s41598-018-25552-z (2018).
Pal, P., Atilla-Gokcumen, G. E. & Frasor, J. Emerging roles of ceramides in breast cancer biology and therapy. Int. J. Mol. Sci. 23, 11178. https://doi.org/10.3390/ijms231911178 (2022).
Corsetto, P. A., Zava, S., Rizzo, A. M. & Colombo, I. The critical impact of sphingolipid metabolism in breast cancer progression and drug response. Int. J. Mol. Sci. 24, 2107 (2023).
Xiao, Y. et al. Comprehensive metabolomics expands precision medicine for triple-negative breast cancer. Cell. Res. 32, 477–490. https://doi.org/10.1038/s41422-022-00614-0 (2022).
Giallourou, N. et al. Characterizing the breast cancer lipidome and its interaction with the tissue microbiota. Commun. Biol. 4, 1229. https://doi.org/10.1038/s42003-021-02710-0 (2021).
Cui, M., Wang, Q. & Chen, G. Serum metabolomics analysis reveals changes in signaling lipids in breast cancer patients. Biomed. Chromatogr. 30, 42–47. https://doi.org/10.1002/bmc.3556 (2016).
Fini, M. A., Monks, J., Farabaugh, S. M. & Wright, R. M. Contribution of Xanthine oxidoreductase to mammary epithelial and breast cancer cell differentiation in part modulates inhibitor of differentiation-1. Mol. Cancer Res. 9, 1242–1254. https://doi.org/10.1158/1541-7786.MCR-11-0176 (2011).
Kus, K. et al. Alterations in arginine and energy metabolism, structural and signaling lipids in metastatic breast cancer in mice detected in plasma by targeted metabolomics and lipidomics. Breast Cancer Res. 20, 148. https://doi.org/10.1186/s13058-018-1075-y (2018).
Gumà, J. et al. Altered serum metabolic profile assessed by advanced 1H-NMR in breast cancer patients. Cancers (Basel). 13, 4281. https://doi.org/10.3390/cancers13174281 (2021).
Goodwin, P. J. et al. Elevated levels of plasma triglycerides are associated with histologically defined premenopausal breast cancer risk. Nutr. Cancer. 27, 284–292. https://doi.org/10.1080/01635589709514539 (1997).
Park, J., Shin, Y., Kim, T. H. & Lee, A. Plasma metabolites as possible biomarkers for diagnosis of breast cancer. PLoS One. 14, e0225129. https://doi.org/10.1371/journal.pone.0225129 (2019).
Ramírez-Pérez, O., Cruz-Ramón, V., Chinchilla-López, P. & Mendez-Sanchez, N. The role of the gut microbiota in bile acid metabolism. Ann. Hepatol. 16(Suppl, s15–s20. https://doi.org/10.5604/01.3001.0010.5672 (2017).
De Simone, G., Balducci, C., Forloni, G., Pastorelli, R. & Brunelli, L. Hippuric acid: could became a barometer for frailty and geriatric syndromes? Ageing Res. Rev. 72, 101466. https://doi.org/10.1016/j.arr.2021.101466 (2021).
Xie, G. et al. Lowered Circulating aspartate is a metabolic feature of human breast cancer. Oncotarget 6, 33369–33381. https://doi.org/10.18632/oncotarget.5409 (2015).
Acknowledgements
We are grateful to Lydie Tessier, Céline Wetterwald and Justine Faure for their technical support. We acknowledge support from the Mohamed V University of Rabat, The University of Sciences, Tecniques, Technologies of Bamako (USTTB), the University of Angers, and from the University Hospitals of Bamako (Point G and Luxembourg) and Angers.
Funding
This work was supported by the Mohammed V University of Rabat, Morocco, the University of Sciences, Techniques, Technologies of Bamako (USTTB), Mali, the University of Angers, France and by the University Hospital of Bamako and Luxembourg, Mali and Angers, France.
Author information
Authors and Affiliations
Contributions
A.D.T.B., Z.O. and P.R. conceived and designed the study. F.M.S., M.L., and B.S.K. recruited patients and acquired clinical data. A.D.T.B., X.D., C.B., A.E.H.A, N.O.K.B., K.C.D., B.C., B.K., G.S. and D.M.P. conducted experiments, data analyses and interpretation of data. M.M., X.D. and J.M.C.B. performed statistical analyses. A.D.T.B, M.M., Z.O. and P.R. drafted the manuscript. All authors approved the submitted version.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
This study was approved by the Ethics Committee of the University of Science, Techniques, and Technologies (USTTB) of Bamako, within the Faculty of Medicine and Pharmacy, under the reference number 2021/236/USTTB. Prior to their participation all individuals involved were provided with a comprehensive information document outlining the study’s objectives and signed an informed consent.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Bissan, A.D.T., Michel, M., Dieu, X. et al. Machine learning-assisted quantitative metabolomics of West African patients with advanced breast cancer. Sci Rep 15, 29603 (2025). https://doi.org/10.1038/s41598-025-13475-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-13475-5
Keywords
This article is cited by
-
Crocodile fossil find, breast cancer ‘signatures’, cholera cases and lemur diversity
Nature Africa (2025)
-
Découverte d’un fossile de crocodile, « signatures » du cancer du sein, cas de choléra et diversité des lémuriens
Nature Africa (2025)
-
Decoding the gut microbiota metabolite–matrix metalloproteinase-3 axis in breast cancer: a multi-omics and network pharmacology study
Molecular Diversity (2025)