Abstract
Endometriosis is a common benign gynecological condition characterized by the growth of endometrial gland and stroma located outside the uterine cavity, which the current approaches for its detection are invasive and expensive, limiting their clinical utility. There is a need for cost-effective and minimally invasive approaches to facilitate the diagnosis of this disease. Attenuated total reflection Fourier-transform infrared and near infrared spectroscopy combined with multivariate classification were applied as a new tool to analyze blood plasma samples from women with endometriosis (n = 41) and healthy individuals (n = 34). In addition, the use of advanced data fusion strategies and multivariate analysis techniques improved the classification models and facilitated diagnostics segregation of both sample categories in a fast and non-destructive way, generating high levels of accuracy, sensitivity and specificity. 2D correlation analysis revealed strong positive correlations between the spectrochemical biomarkers identified in both IR regions. To the best of our knowledge, this is the first study demonstrating the efficacy of a new tool for fast and non-invasive diagnosis of endometriosis using blood plasma samples analyzed with IR spectroscopy combined with multivariate classification.
Similar content being viewed by others
Introduction
Endometriosis is a common benign gynecological condition characterized by the growth of endometrial gland and stroma located outside the uterine cavity, resulting in scar tissue, adhesions and inflammatory reactions1. Previous studies reported main symptoms including dysmenorrhea, dyspareunia, pelvic pain and infertility, which has a negative impact on quality of life2.
Ultrasonography presents sensitivity over 90% to define an endometrioma cyst with the typic visualization3. However, the diagnosis is limited in women with suspected early-stage disease and in cases of deeply penetrated endometriosis. In these cases, the diagnosis of endometriosis depends on histopathologic examination after invasive laparoscopic surgery excision, remaining the only standard accurate diagnostic investigation4. Therefore, researchers are directing their attention towards novel non-invasive methodologies for endometriosis diagnosis, including the analysis of blood biomarkers and genetic predispositions. However, to date, no serum marker has been found to diagnose endometriosis with adequate sensitivity and specificity.
Attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy allows evidence into the chemical composition of biological samples at a molecular level. FTIR vibrational analysis performs a determination of functional groups, which build compounds of lipids, protein, carbohydrate, and deoxyribonucleic acid (DNA)5. Previous reports have shown the potential of vibrational spectroscopy in categorizing gynecological complications, such as ovarian, endometrial and cervical cancer6,7,8. Additionally, near-infrared (NIR) imaging is a promising technique for the enhanced detection of endometriotic lesions. This technique relies on the application of a fluorescent dye, which is absorbed by the tissue or transported intraluminally.
Infrared spectroscopy evaluates the absorbed energy in biomolecules within the mid-infrared (IR) region of the electromagnetic spectrum9, measuring the absorbance of IR within this region. Each chemical bond has unique vibrational properties, leading to its absorption of IR radiation at specific frequencies, usually expressed as wavenumbers. Plotting wavenumber against absorbance intensity creates a spectrum representing the types and amounts of bonds present in the material. This pattern is called the 'biochemical-cell fingerprint’ in biological samples. Considering that endometriosis leads to chemical alterations in the IR spectrum, this technique has the potential to distinguish this condition from healthy individuals. IR spectroscopy together with chemometric methods has played an increasingly important role in the field of medical and biological analysis, through quickly detecting gynecological conditions at early stages. This is the first description of NIR and ATR-FTIR spectroscopy in blood plasma patients diagnosed with early endometriosis, elucidating potential future applications of this technique.
Results
The pre-processed IR spectra of blood plasma for both the control group (healthy women) and the endometriosis group are illustrated in Fig. 1. The dataset comprises 75 samples, generating a total of 225 spectra, with each sample providing three spectra. To observe the reproducibility of the spectral data, the mean and the standard deviation were calculated for each ATR-FTIR and NIR replicate to compare the differences between them. For ATR-FTIR, it was obtained the values of 0.0625 ± 0.0027 (replicate 1), 0.627 ± 0.0029 (replicate 2) and 0.626 ± 0.024 (replicate 3) for endometriosis and 0.0618 ± 0.0021 (replicate 1), 0.0618 ± 0.0023 (replicate 2) and 0.0614 ± 0.0018 (replicate 3) for the control group. For NIR, the values were 0.5569 ± 0.0725 (replicate 1), 0.5274 ± 0.0542 (replicate 2) and 0.5262 ± 0.0583 (replicate 3) for endometriosis and 0.4734 ± 0.0198 (replicate 1), 0.4637 ± 0.0207 (replicate 2) and 0.4645 ± 0.0189 (replicate 3) for healthy controls. The small difference between the obtained mean ± standard deviation values demonstrate that the data presents a high reproducibility. These spectra were averaged before constructing the model. In the ATR-FTIR region (Fig. 1A), several characteristic absorption bands are discernible. Notable among these are the Amide I band of proteins (~ 1650 cm-1), a minor band at 1552 cm-1 corresponding to the C = O bond of Amide II, and a feature at 3366 cm-1 associated with O–H stretching10,11.
The NIR spectra (Fig. 1B) show the –CH₂ second overtone at 1179 nm, as well as the first overtone of N–H stretching and the first overtone of O–H stretching. The band at 1788 nm is attributed to lipid structures, while the bands at 1950 nm and 2332 nm correspond to the second overtone of C–O stretching in carbohydrates and the stretching and bending of CH associated with methylene, respectively12. In addition, the average signal-to-noise ratio (SNR) values for the spectra dataset were investigated. The calculated SNR for ATR-FTIR spectra were 145.19 and 160.19 for the endometriosis and the control classes, respectively, while for NIR, the SNR were 103.29 and 113.86 for endometriosis and controls, respectively. These values suggest a relative high quality of both spectroscopic measures13.
To obtain the spectral variability within each of the classes, the relative standard deviation (RSD) was calculated. For ATR-FTIR data, the RSD values were 6.43% and 4.05% for the control and endometriosis classes, respectively, whereas for NIR, the RSD values were 18.31% and 18.59% for controls and endometriosis, respectively. These results suggest a low spectral variability for the studied classes, especially in relation to the ATR-FTIR spectra. The standard deviation values for each wavenumber/wavelength in the IR spectra (Fig. 1C, D) shows the variation of the IR variables around its average spectra. The NIR data presents more fluctuations on its signal resulting in higher values of standard deviation.
Figure 1E, F shows the average second derivative spectra of ATR-FTIR and NIR. By differentiating the spectra and then averaging the derivative spectra, bands with characteristic differences between healthy women and endometriosis group can be obtained. In the ATR-FTIR region (Fig. 1E), the characteristic differences between the two groups were mainly at 1632 cm−1 and 1548 cm−1, which corresponds to the bands of Amide I and Amide II11, respectively, indicating that these bands are responsible for the separation between the two groups. Meanwhile, the characteristic differences of NIR spectra were more evident. In the NIR range, the characteristic differences between the two groups were mainly at 1852 nm, 1900 nm, 2244 nm and 2332 nm, that corresponds to the O–H combination band, combination frequency of water absorption, stretching and bending of CH associated with methylene, and the -CH3 methyl combination band, respectively12.
Both the NIR and ATR-FTIR spectra display significant overlap in spectral features between the control and endometriosis groups, necessitating the use of multivariate tools to distinguish between these categories. Several supervised multivariate classification tools were tested, and their performances are summarized in Table 1.
The best discrimination results for ATR-FTIR data were obtained using PCA-SVM (77% accuracy), followed by GA-SVM (73% accuracy). This performance is consistent with F-scores of 86% and 72%, respectively, indicating that classification was not affected by imbalanced class sizes. For PCA-SVM, three principal components (PCs) were selected, representing 97.18% of the accumulated variance. Figure 2D–F depicts the PCA scores for an overview of the classes’ separation. However, the group distribution was not satisfactory just with the use of this exploratory analysis technique, probably due to the high spectral overlap between both classes. Thus, it is necessary the application of a discriminant algorithm, such as LDA, QDA or SVM, enabling an accurate classification of the samples. The loading profiles of these PCs, used for spectral assignment, are shown in Figs. 2A–C. The wavenumbers contributing to this discrimination are summarized in Supplementary Table S1.
GA-SVM selected 12 spectral wavenumbers from the 1726 variables in the entire spectrum for class differentiation. These wavenumbers are 3602, 3476, 2644, 2404, 1850, 1502, 1454, 1156, 1094, 978, 702, and 640 cm⁻1 (Supplementary Figure S1A). The GA-SVM discriminant function (DF) is depicted in Supplementary Figure S1B, with the tentative biomarker assignments for selected wavenumbers shown in Supplementary Table S2. The best fit for NIR data was achieved using PCA-LDA and GA-LDA, both with 100% accuracy. The assignments for PCA-LDA loadings are depicted in Supplementary Table S3, while the loadings and the scores are shown in Fig. S3. Although PC1 vs. PC2 do not show the separation between the classes, this is evident in both scores plots in Fig. S3E-F, when it is possible observe the separation between endometriosis and control groups, due to some compositional differences between the two classes. Figure 3 shows the discrimination plot and the 14 selected variables for NIR using GA-LDA.
The 14 spectral wavelengths responsible for class separation in NIR data using GA-LDA are 2332, 2289, 1498, 1479, 1393, 1370, 1361, 1304, 1223, 1187, 1096, 1073, 1037, and 901 nm. The assignments for these variables are summarized in Supplementary Table S4. To improve results, a low-level data fusion strategy was applied to IR data, with results depicted in Table 2.
The results indicate good performance for the PCA-LDA, SPA-LDA, and GA-LDA models in discriminating data using fused NIR and MIR spectra. The combination of overtones and combination bands from NIR with fundamental vibrations found in ATR-FTIR provides comprehensive data characterization14. SPA-LDA selected 15 variables, while GA-LDA selected 11 variables (Supplementary Table S5), with tentative biomarker assignments shown in Table 3.
Additionally, 2D correlation spectroscopy was applied to study the relationship between different ATR-FTIR and NIR wavenumbers/wavelengths (Fig. 4). This correlation aids in identifying the molecular origin of NIR markers, as MIR bands are sharper and well-documented in specialized literature15. A correlation marker for endometriosis patients was found at ~ 1650 cm−1 of the MIR spectra combined with the 1170 nm NIR band, being related to the amide I band. The marker at 1552 cm⁻1, attributable to amide II, is consistent with the NIR band at 1100 nm, representing the second overtone of secondary amides10,16. Other markers include wavenumbers at 2782 cm⁻1, 1402 cm⁻1, and 1416 cm⁻1 MIR bands, attributed to N–H stretching, symmetric CH₃ bending modes of methyl groups of proteins, and N–H deformation/C–N stretching, respectively.
Discussion
There are few studies in specialized literature that search for new tools to predict the possible diagnosis of endometriosis using non-invasive methods. These methods include the analysis of blood biomarkers and genetic predispositions, demonstrating high sensitivity and specificity17,18. Kokot et al. 2022, developed a partial least squares discriminant analysis (PLS-DA) model based on selected serum biochemical parameters, specific regions of the serum’s ATR-FTIR spectra, and combined data to diagnostic screening of advanced endometriosis, reaching an overall accuracy of 87.5%19. Therefore, a novel method using IR spectroscopy coupled with multivariate classification tools was applied to analyze blood plasma collected from women with endometriotic patterns and healthy individuals to discriminate between the two groups.
In this study, 75 blood plasma samples from women were analyzed using ATR-FTIR and NIR spectroscopy to stratify patients with endometriosis (41 out of 75) and healthy controls (34 out of 75). Various chemometric algorithms were tested for category discrimination, with the wavenumbers/wavelengths most responsible for segregating the different classes designated as potential diagnostic biomarkers.
ATR-FTIR spectroscopy is a powerful tool for analyzing different biological structures based on spectral analysis, proving to be highly useful in clinical studies20. Previously, this tool has been successfully applied to detect gestational diabetes mellitus by our research group21 and endometrial cancer22. Herein, ATR-FTIR spectroscopy combined with chemometric algorithms was employed to detect patients with endometriosis. The best results were obtained using PCA-SVM, which achieved a classification accuracy of 77% and a sensitivity of 83%, followed by GA-SVM with an accuracy of 73%. Twelve spectral wavenumbers were responsible for class separation based on GA-SVM (Table S2). Bands related to Amide I at 1640 cm⁻1, Amide II at 1548 cm⁻1, and some features related to proteins at 1538 cm⁻1 were identified as potential discriminant biomarkers.
In addition, NIR spectroscopy is a valuable tool for analyzing different types of diseases in biologically-derived samples. This spectroscopy has been successfully applied to detect HIV in pregnant women23. PCA-LDA and GA-LDA were capable of discriminating patients with endometriosis from healthy women with 100% accuracy, 100% sensitivity, and 100% specificity. Furthermore, the NIR technique allows the use of portable instruments, which represents an operational advantage. However, NIR spectroscopy has limited chemical specificity due to the high degree of overlapping spectral features, making pure biomarker identification using only this technique infeasible24. In this context, IR data fusion (ATR-FTIR and NIR) appears as an alternative strategy, combining the advantages of both IR spectroscopies in a single model. This approach increases the reliability of classification algorithms compared to using a single analytical technique25, as demonstrated by Yang et al. 2021, that successfully applied an NIR-MIR spectral feature fusion based on PLS-DA to obtain a rapid and accurate diagnostic of Alzheimer’s disease, obtaining an accuracy of 100%26.
Applying the data fusion strategy, the classification models using all studied feature extraction/selection algorithms combined with LDA and SVM achieved 100% accuracy, 100% sensitivity, and 100% specificity, confirming the feasibility of this approach for identifying endometriosis patients. The successful application of this method in a clinical environment could aid in the early diagnosis of endometriosis in a low-cost and non-invasive manner.
Furthermore, the multimodal spectroscopy or data fusion approach can be considered an advanced analytical tool that provides comprehensive and complementary information previously inaccessible, enabling the elucidation of intricate details. The application of 2D correlation spectroscopy allowed for the study of relationships between different MIR and NIR variables (wavenumbers/wavelengths). Markers found in the synchronous map were related to the Amide I and Amide II MIR bands, which matched the variables selected by the chemometric models. Finally, this approach demonstrates that NIR bands are correlated with specific ATR-FTIR bands, enhancing the interpretability of NIR spectra.
Conclusions
To the best of our knowledge, this is the first study demonstrating the efficacy of a new tool for fast and non-invasive diagnosis of endometriosis using blood plasma samples analyzed with IR spectroscopy combined with multivariate classification. The PCA-LDA, SPA-LDA, and GA-LDA models showed excellent performance when applied to the fused ATR-FTIR and NIR data, with high discriminant power between the studied classes. In this context, spectroscopy emerges as a potential method for studying endometriosis and could be applied in clinical practice in the future, either as a screening or diagnostic test.
Methods
The methodology used in this study is summarized in the workflow in Fig. 5.
Study design and population
We conducted a case–control study at the Januário Cicco Maternity School, affiliated with the Federal University of Rio Grande do Norte. A total of forty-one women (n = 41) with a clinical diagnosis of endometriosis were recruited, alongside thirty-four healthy women (n = 34). Subjects were excluded if they had autoimmune diseases, immunodeficiencies, chronic illnesses, or genetic syndromes linked to sex chromosomes. The characteristics of the included patients are presented in Table 4. All methods used in this study adhered to approved guidelines. The study received ethical approval from the ethics committee at the Januário Cicco Maternity School, Federal University of Rio Grande do Norte, under protocol number 44352921.6.0000.5292, and written informed consent was obtained from all subjects. All procedures were conducted in compliance with the Declaration of Helsinki.
Sample collection and determination for analysis with ATR-FTIR and NIR
Venous blood samples were collected from all participants. The samples were centrifuged at 3600 rpm for 7 min to separate erythrocytes from the plasma. Subsequently, 100 µL aliquots of plasma were transferred to Eppendorf tubes and stored at −80 °C until ATR-FTIR and NIR analysis. This sample collection resulted in a total of 75 samples, with 41 from patients with endometriosis and 34 from the healthy control group.
ATR-FTIR spectroscopy
The blood plasma samples were left to thaw for 30–40 min at room temperature before spectrochemical measurement. 10 µL aliquots were used for analysis and all tests were obtained in triplicates using wet samples, with a new aliquot applied for each of the replicates. The spectral data were obtained using an attenuated total reflectance ATR-FTIR of PerkinElmer Spectrum Two FTIR type (PerkinElmer, USA) using a single-reflection diamond accessory. Measurements were performed using a KBr beamsplitter and a LiTaO3 (lithium tantalate) detector. The spectra were recorded with a total of 32 scans. Next, they underwent Norton-Beer apodization and Fourier transformation using a zero-filling factor of 2 with 4 cm−1 resolutions at the range between 4000 and 550 cm−1. The ATR crystal was cleaned with ethyl alcohol (70% v/v) for each sample, before acquiring the experimental background and after reading each sample to minimize atmosphere variations over time.
NIR spectroscopy
NIR were obtained using an ARCoptix FT-NIR Rocker spectrophotometer (Arcoptix S.A., Switzerland). The tests were taken in transflectance mode with a resolution of 5 nm. The methodology was carried out by transferring 10 µL of the sample to an aluminum paper surface and using an optical fiber positioned onto each paper. The measurements were obtained in triplicate for each sample, with a new aliquot being prepared for each replicate, in the 900–2600 nm range.
Data analysis
The data analysis was performed in MATLAB R2014b environment version 8.4 (Math Works, Inc., USA), with PLS Toolbox version 7.9.3 (Eigenvector Research, Inc., USA) and the Classification toolbox (version 7.0) by Milano chemometrics and QSAR research group27. The raw ATR-FTIR spectra were loaded and pre-processed by Savitzky-Golay (SG) smoothing (window of 7 points, 2nd order polynomial fitting) and automatic weighted least squares (AWLS) baseline correction. NIR spectra were pre-processed with multiplicative signal correction (MSC) and SG smoothing (window of 15 points, 2nd order polynomial fitting). In addition, a low-level data fusion strategy was applied by combining the NIR and the ATR-FTIR information to achieve more accurate results28. The ATR-FTIR/NIR fused data were normalized due to the different scales of the spectra. Both data were mean centered before the analysis. The samples were divided into training (70%) and test (30%) subsets by applying the Kennard-Stone (KS) sampling algorithm29, using the training set for model construction and test set for its validation. For building the models, feature extraction/selection algorithms combined with discriminant analysis and support vector machine (SVM) were applied. The genetic algorithm (GA)30 and the successive projections algorithm (SPA)31 were applied for feature selection, while principal components analysis (PCA)32 was used for feature extraction. The discriminant analysis was performed by linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). The algorithms PCA-LDA/QDA/SVM, SPA-LDA/QDA/SVM and GA-LDA/QDA/SVM were tested independently to achieve the best classification model.
In the PCA model, each PC is composed of scores (variance in the sample direction), used to assess similarities/dissimilarities between the samples; and the loadings (variance in the variables direction), used to show the weight of each variable (wavenumber or wavelength) towards the scores pattern. The PCA decomposition of a spectral dataset can be calculated by Eq. 1:
where T is the score matrix; P is the loadings matrix; and, E is the residual matrix. The spectral variables selected by SPA and GA are based on minimizing the cost function G according to the Eq. 2:
where NV represents the number of validation samples and gn can be calculated according to Eq. 3:
where r2(xn,mI(n)) is the squared Mahalanobis distance between the object xn and the center of its true category, while r2(xn,mI(m)) represents the squared Mahalanobis distance between the object xn and the center of the closest wrong category mI(m).
The LDA (Lik) and QDA (Qik) discriminant analysis are calculated in a non-Bayesian form as described below 33,34:
where xi is a vector containing the input variables for each sample; \({\overline{x} }_{k}\) is the mean spectrum of category k; and, Cpooled and Ck are the pooled covariance matrix and the variance–covariance matrix for category k, respectively.
SVM is a binary linear classifier using a nonlinear step that transforms the input sample space into a feature space by the application of a kernel function35. The SVM can be calculated by the following Eq. 636:
where NSV represents the number of support vectors; αi, yi, and k(xi,zj) are the Langrage multiplier, the class membership (± 1) and the kernel function, respectively; and b is the bias parameter.
Model quality evaluation
The classification performance of each method was obtained based on the model accuracy, sensitivity and specificity. The accuracy (AC) represents the total number of samples correctly classified, while the specificity (SPEC) and the sensitivity (SENS) measure the proportion of negatives and positives that are correctly identified, respectively. The F-Score measures the overall classification performance considering imbalanced data, while the G-Score measures the overall classification performance not accounting for class sizes37. These validation parameters are calculated by the following equations:
where TP, TN, FP and FN represent the numbers of true positives, true negatives, false positives and false negatives, respectively.
Data availability
The datasets used and analysed during the current study available from the corresponding author on reasonable request.
References
Dastur, A. E. et al. John A Sampson and the origins of endometriosis. J Obstet Gynaecol India. 60(4), 299–300. https://doi.org/10.1007/s13224-010-0046-8 (2011).
Fuldeore, M. J. et al. Prevalence and symptomatic burden of diagnosed endometriosis in the United States: National estimates from a cross-sectional survey of 59,411 women. Gynecol Obstet Invest. 82(5), 453–461. https://doi.org/10.1159/000452660 (2017).
Moore, J. et al. A systematic review of the accuracy of ultrasound in the diagnosis of endometriosis. Ultrasound Obstet Gynecol 20(6), 630–634. https://doi.org/10.1046/j.1469-0705.2002.00862.x (2002).
Hsu, A. L. et al. Invasive and noninvasive methods for the diagnosis of endometriosis. Clin Obstet Gynecol. 53(2), 413–419. https://doi.org/10.1097/GRF.0b013e3181db7ce8 (2010).
Johnson, N. P. et al. World endometriosis society consensus on the classification of endometriosis. Hum Reprod. 32(2), 315–324. https://doi.org/10.1093/humrep/dew293 (2017).
Sindhuphak, R. et al. A new approach for the detection of cervical cancer in Thai women. Gynecol Oncol. 90(1), 10–14. https://doi.org/10.1016/s0090-8258(03)00196-3 (2003).
Kluz-Barłowska, M. et al. FT-Raman and FTIR spectroscopy as a tools showing marker of platinum-resistant phenomena in women suffering from ovarian cancer. Sci Rep. 14(1), 11025. https://doi.org/10.1038/s41598-024-61775-z (2024).
Taylor, S. E. et al. Infrared spectroscopy with multivariate analysis to interrogate endometrial tissue: A novel and objective diagnostic approach. Br J Cancer. 104(5), 790–797. https://doi.org/10.1038/sj.bjc.6606094 (2011).
Martin, F. L. et al. Distinguishing cell types or populations based on the computational analysis of their infrared spectra. Nat Protoc. 5(11), 1748–1760. https://doi.org/10.1038/nprot.2010.133 (2010).
Movasaghi, Z. et al. Fourier transform infrared (FTIR) spectroscopy of biological tissues. Appl Spectrosc Rev. 43(2), 134–179. https://doi.org/10.1080/05704920701829043 (2008).
Silva, L. G. et al. ATR-FTIR spectroscopy in blood plasma combined with multivariate analysis to detect HIV infection in pregnant women. Sci Rep. 10(1), 20156. https://doi.org/10.1038/s41598-020-77378-3 (2020).
Workman, J. J. et al. Interpretive spectroscopy for near infrared. Appl Spectrosc Rev. 31(3), 251–320. https://doi.org/10.1080/05704929608000571 (1996).
Yang, Z. et al. Challenges of Raman spectra to estimate carbonyl index of microplastics: A case study with environmental samples from sea surface. Mar. Pollut. Bull. 194, 115362. https://doi.org/10.1016/j.marpolbul.2023.115362 (2023).
Câmara, A. B. F. et al. Multivariate assessment for predicting antioxidant activity from clove and pomegranate extracts by MCR-ALS and PLS models combined to IR spectroscopy. Food Chem. 384, 132321. https://doi.org/10.1016/j.foodchem.2022.132321 (2022).
Chakkumpulakkal, P. V. T. et al. A multimodal spectroscopic approach combining mid-infrared and near-infrared for discriminating gram-positive and gram-negative bacteria. Anal Chem. 96(46), 18392–18400. https://doi.org/10.1021/acs.analchem.4c03060 (2024).
Siesler, H. W. et al. Near-infrared spectroscopy: Principles, instruments, applications (ed. Siesler, H.W., Ozaki, Y., Kawata, S., Heise, H.M.) 1–568 (Wiley, 2008).
Letsiou, S. et al. Endometriosis is associated with aberrant metabolite profiles in plasma. Fertil Steril. 107(3), 699-706.e6. https://doi.org/10.1016/j.fertnstert.2016.12.032 (2017).
Loy, S. L. et al. Discovery and validation of peritoneal endometriosis biomarkers in peritoneal fluid and serum. Reprod Biomed Online. 43(4), 727–737. https://doi.org/10.1016/j.rbmo.2021.07.002 (2021).
Kokot, et al. ATR-IR spectroscopy application to diagnostic screening of advanced endometriosis. Oxid. Med. Cell. Longev. 2022(1), 4777434. https://doi.org/10.1155/2022/4777434 (2022).
Kelly, J. G. et al. Biospectroscopy to metabolically profile biomolecular structure: A multistage approach linking computational analysis with biomarkers. J Proteome Res. 10(4), 1437–1448. https://doi.org/10.1021/pr101067u (2011).
Bernardes-Oliveira, E. et al. Spectrochemical differentiation in gestational diabetes mellitus based on attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy and multivariate analysis. Sci Rep. 10(1), 19259. https://doi.org/10.1038/s41598-020-75539-y (2020).
Paraskevaidi, M. et al. Detecting endometrial cancer by blood spectroscopy: A diagnostic cross-sectional study. Cancers (Basel). 12(5), 1256. https://doi.org/10.3390/cancers12051256 (2020).
Freitas, D. L. D. et al. Near-infrared spectroscopy of blood plasma with chemometrics towards HIV discrimination during pregnancy. Sci Rep. 11(1), 22609. https://doi.org/10.1038/s41598-021-02105-5 (2021).
Assis, C. et al. Combining mid infrared spectroscopy and paper spray mass spectrometry in a data fusion model to predict the composition of coffee blends. Food Chem. 281, 71–77. https://doi.org/10.1016/j.foodchem.2018.12.044 (2019).
Veettil, T. C. P. et al. A combined near-infrared and mid-infrared spectroscopic approach for the detection and quantification of glycine in human serum. Sensors. 22(12), 4528. https://doi.org/10.3390/s22124528 (2022).
Yang, et al. Early rapid diagnosis of Alzheimer’s disease based on fusion of near- and mid-infrared spectral features combined with PLS-DA. Optik 241, 166485. https://doi.org/10.1016/j.ijleo.2021.166485 (2021).
Ballabio, D. et al. Classification tools in chemistry. Part 1: linear models. PLS-DA. Anal Methods. 5(16), 3790–3798. https://doi.org/10.1039/C3AY40582F (2013).
Yu, S. et al. Qualitative and quantitative assessment of flavor quality of Chinese soybean paste using multiple sensor technologies combined with chemometrics and a data fusion strategy. Food Chem. 405, 134859. https://doi.org/10.1016/j.foodchem.2022.134859 (2023).
Kennard, R. et al. Computer aided design of experiments. Technometrics. 11(1), 137–148. https://doi.org/10.1080/00401706.1969.10490666 (1969).
McCall, J. Genetic algorithms for modelling and optimisation. J Comput Appl Math. 184(1), 205–222. https://doi.org/10.1016/j.cam.2004.07.034 (2005).
Pontes, M. J. C. et al. The successive projections algorithm for spectral variable selection in classification problems. Chemom Intell Lab Syst. 78(1–2), 11–18. https://doi.org/10.1016/j.chemolab.2004.12.001 (2005).
Bro, R. et al. Principal component analysis. Anal. Methods. 6(9), 2812–2831. https://doi.org/10.1039/C3AY41907J (2014).
Morais, C. L. et al. Principal component analysis with linear and quadratic discriminant analysis for identification of cancer samples based on mass spectrometry. J. Braz. Chem. Soc. 29(3), 472–481. https://doi.org/10.21577/0103-5053.20170159 (2018).
Sarah, J. et al. Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure. Chemom Intell Lab Syst. 95(1), 1–17. https://doi.org/10.1016/j.chemolab.2008.07.010 (2009).
Cortes, C. et al. Support-vector networks. Mach Learn. 20, 273–297. https://doi.org/10.1007/BF00994018 (1995).
Morais, C. L. et al. Variable selection with a support vector machine for discriminating Cryptococcus fungal species based on ATR-FTIR spectroscopy. Anal Methods. 9(20), 2964–2970. https://doi.org/10.1039/C7AY00428A (2017).
Morais, C. L. et al. Comparing unfolded and two-dimensional discriminant analysis and support vector machines for classification of EEM data. Chemom Intell. Lab. Syst. 170, 1–12. https://doi.org/10.1016/j.chemolab.2017.09.001 (2017).
Acknowledgements
The authors acknowledge the support provided by the Post-Graduate Chemistry Program (PPGQ/UFRN) and the Laboratório de Pesquisa em Petróleo—LAPET (IQ/UFRN). A. B. F. Câmara would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq/FAPERN, Brazil), code 153445/2024-6 for the financial support.
Author information
Authors and Affiliations
Contributions
A.T.B.S., A.B.F.C. and C.M.A.M.S. designed the experiments. A.T.B.S. contributed to the collection of biological samples. K.M.G.L. and J.C.O.C. analyzed the data and contributed with reagents, materials, and/or analysis tools. A.T.B.S. and A.B.F.C. contributed to manuscript preparation. K.M.G.L., J.C.O.C. and C.L.M.M. refined the manuscript for publication. A.B.F.C. and K.M.G.L., data analysis. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
de Souza, A.T.B., Câmara, A.B.F., de Araújo Medeiros Santos, C.M. et al. Spectrochemical differentiation in endometriosis based on infrared spectroscopy advanced data fusion and multivariate analysis. Sci Rep 15, 5071 (2025). https://doi.org/10.1038/s41598-025-89504-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-89504-0







