Spectrochemical differentiation in endometriosis based on infrared spectroscopy advanced data fusion and multivariate analysis

de Souza, Amaxsell Thiago Barros; Câmara, Anne Beatriz Figueira; de Araújo Medeiros Santos, Cristina Maria; de Lelis Medeiros de Morais, Camilo; de Oliveira Crispim, Janaina Cristiana; de Lima, Kássio Michell Gomes

doi:10.1038/s41598-025-89504-0

Download PDF

Article
Open access
Published: 11 February 2025

Spectrochemical differentiation in endometriosis based on infrared spectroscopy advanced data fusion and multivariate analysis

Amaxsell Thiago Barros de Souza¹^na1,
Anne Beatriz Figueira Câmara²^na1,
Cristina Maria de Araújo Medeiros Santos³,
Camilo de Lelis Medeiros de Morais⁴,
Janaina Cristiana de Oliveira Crispim^1,5 &
…
Kássio Michell Gomes de Lima²

Scientific Reports volume 15, Article number: 5071 (2025) Cite this article

2760 Accesses
4 Citations
12 Altmetric
Metrics details

Subjects

Abstract

Endometriosis is a common benign gynecological condition characterized by the growth of endometrial gland and stroma located outside the uterine cavity, which the current approaches for its detection are invasive and expensive, limiting their clinical utility. There is a need for cost-effective and minimally invasive approaches to facilitate the diagnosis of this disease. Attenuated total reflection Fourier-transform infrared and near infrared spectroscopy combined with multivariate classification were applied as a new tool to analyze blood plasma samples from women with endometriosis (n = 41) and healthy individuals (n = 34). In addition, the use of advanced data fusion strategies and multivariate analysis techniques improved the classification models and facilitated diagnostics segregation of both sample categories in a fast and non-destructive way, generating high levels of accuracy, sensitivity and specificity. 2D correlation analysis revealed strong positive correlations between the spectrochemical biomarkers identified in both IR regions. To the best of our knowledge, this is the first study demonstrating the efficacy of a new tool for fast and non-invasive diagnosis of endometriosis using blood plasma samples analyzed with IR spectroscopy combined with multivariate classification.

Unveiling endometriosis hidden comorbidities using a data-driven approach: a retrospective matched cohort study

Article Open access 13 May 2025

Infrared and Raman spectroscopy of blood plasma for rapid endometrial cancer detection

Article Open access 18 May 2025

Colocalization of senescent biomarkers in deep, superficial, and ovarian endometriotic lesions: a pilot study

Article Open access 14 October 2022

Introduction

Endometriosis is a common benign gynecological condition characterized by the growth of endometrial gland and stroma located outside the uterine cavity, resulting in scar tissue, adhesions and inflammatory reactions¹. Previous studies reported main symptoms including dysmenorrhea, dyspareunia, pelvic pain and infertility, which has a negative impact on quality of life².

Ultrasonography presents sensitivity over 90% to define an endometrioma cyst with the typic visualization³. However, the diagnosis is limited in women with suspected early-stage disease and in cases of deeply penetrated endometriosis. In these cases, the diagnosis of endometriosis depends on histopathologic examination after invasive laparoscopic surgery excision, remaining the only standard accurate diagnostic investigation⁴. Therefore, researchers are directing their attention towards novel non-invasive methodologies for endometriosis diagnosis, including the analysis of blood biomarkers and genetic predispositions. However, to date, no serum marker has been found to diagnose endometriosis with adequate sensitivity and specificity.

Attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy allows evidence into the chemical composition of biological samples at a molecular level. FTIR vibrational analysis performs a determination of functional groups, which build compounds of lipids, protein, carbohydrate, and deoxyribonucleic acid (DNA)⁵. Previous reports have shown the potential of vibrational spectroscopy in categorizing gynecological complications, such as ovarian, endometrial and cervical cancer^6,7,8. Additionally, near-infrared (NIR) imaging is a promising technique for the enhanced detection of endometriotic lesions. This technique relies on the application of a fluorescent dye, which is absorbed by the tissue or transported intraluminally.

Infrared spectroscopy evaluates the absorbed energy in biomolecules within the mid-infrared (IR) region of the electromagnetic spectrum⁹, measuring the absorbance of IR within this region. Each chemical bond has unique vibrational properties, leading to its absorption of IR radiation at specific frequencies, usually expressed as wavenumbers. Plotting wavenumber against absorbance intensity creates a spectrum representing the types and amounts of bonds present in the material. This pattern is called the 'biochemical-cell fingerprint’ in biological samples. Considering that endometriosis leads to chemical alterations in the IR spectrum, this technique has the potential to distinguish this condition from healthy individuals. IR spectroscopy together with chemometric methods has played an increasingly important role in the field of medical and biological analysis, through quickly detecting gynecological conditions at early stages. This is the first description of NIR and ATR-FTIR spectroscopy in blood plasma patients diagnosed with early endometriosis, elucidating potential future applications of this technique.

Results

The pre-processed IR spectra of blood plasma for both the control group (healthy women) and the endometriosis group are illustrated in Fig. 1. The dataset comprises 75 samples, generating a total of 225 spectra, with each sample providing three spectra. To observe the reproducibility of the spectral data, the mean and the standard deviation were calculated for each ATR-FTIR and NIR replicate to compare the differences between them. For ATR-FTIR, it was obtained the values of 0.0625 ± 0.0027 (replicate 1), 0.627 ± 0.0029 (replicate 2) and 0.626 ± 0.024 (replicate 3) for endometriosis and 0.0618 ± 0.0021 (replicate 1), 0.0618 ± 0.0023 (replicate 2) and 0.0614 ± 0.0018 (replicate 3) for the control group. For NIR, the values were 0.5569 ± 0.0725 (replicate 1), 0.5274 ± 0.0542 (replicate 2) and 0.5262 ± 0.0583 (replicate 3) for endometriosis and 0.4734 ± 0.0198 (replicate 1), 0.4637 ± 0.0207 (replicate 2) and 0.4645 ± 0.0189 (replicate 3) for healthy controls. The small difference between the obtained mean ± standard deviation values demonstrate that the data presents a high reproducibility. These spectra were averaged before constructing the model. In the ATR-FTIR region (Fig. 1A), several characteristic absorption bands are discernible. Notable among these are the Amide I band of proteins (~ 1650 cm^-1), a minor band at 1552 cm^-1 corresponding to the C = O bond of Amide II, and a feature at 3366 cm^-1 associated with O–H stretching^10,11.

The NIR spectra (Fig. 1B) show the –CH₂ second overtone at 1179 nm, as well as the first overtone of N–H stretching and the first overtone of O–H stretching. The band at 1788 nm is attributed to lipid structures, while the bands at 1950 nm and 2332 nm correspond to the second overtone of C–O stretching in carbohydrates and the stretching and bending of CH associated with methylene, respectively¹². In addition, the average signal-to-noise ratio (SNR) values for the spectra dataset were investigated. The calculated SNR for ATR-FTIR spectra were 145.19 and 160.19 for the endometriosis and the control classes, respectively, while for NIR, the SNR were 103.29 and 113.86 for endometriosis and controls, respectively. These values suggest a relative high quality of both spectroscopic measures¹³.

To obtain the spectral variability within each of the classes, the relative standard deviation (RSD) was calculated. For ATR-FTIR data, the RSD values were 6.43% and 4.05% for the control and endometriosis classes, respectively, whereas for NIR, the RSD values were 18.31% and 18.59% for controls and endometriosis, respectively. These results suggest a low spectral variability for the studied classes, especially in relation to the ATR-FTIR spectra. The standard deviation values for each wavenumber/wavelength in the IR spectra (Fig. 1C, D) shows the variation of the IR variables around its average spectra. The NIR data presents more fluctuations on its signal resulting in higher values of standard deviation.

Figure 1E, F shows the average second derivative spectra of ATR-FTIR and NIR. By differentiating the spectra and then averaging the derivative spectra, bands with characteristic differences between healthy women and endometriosis group can be obtained. In the ATR-FTIR region (Fig. 1E), the characteristic differences between the two groups were mainly at 1632 cm⁻¹ and 1548 cm⁻¹, which corresponds to the bands of Amide I and Amide II¹¹, respectively, indicating that these bands are responsible for the separation between the two groups. Meanwhile, the characteristic differences of NIR spectra were more evident. In the NIR range, the characteristic differences between the two groups were mainly at 1852 nm, 1900 nm, 2244 nm and 2332 nm, that corresponds to the O–H combination band, combination frequency of water absorption, stretching and bending of CH associated with methylene, and the -CH₃ methyl combination band, respectively¹².

Both the NIR and ATR-FTIR spectra display significant overlap in spectral features between the control and endometriosis groups, necessitating the use of multivariate tools to distinguish between these categories. Several supervised multivariate classification tools were tested, and their performances are summarized in Table 1.

Table 1 Quality parameters calculated for the test set using different supervised classification algorithms to distinguish women with endometriosis and the healthy control women.

Full size table

The best discrimination results for ATR-FTIR data were obtained using PCA-SVM (77% accuracy), followed by GA-SVM (73% accuracy). This performance is consistent with F-scores of 86% and 72%, respectively, indicating that classification was not affected by imbalanced class sizes. For PCA-SVM, three principal components (PCs) were selected, representing 97.18% of the accumulated variance. Figure 2D–F depicts the PCA scores for an overview of the classes’ separation. However, the group distribution was not satisfactory just with the use of this exploratory analysis technique, probably due to the high spectral overlap between both classes. Thus, it is necessary the application of a discriminant algorithm, such as LDA, QDA or SVM, enabling an accurate classification of the samples. The loading profiles of these PCs, used for spectral assignment, are shown in Figs. 2A–C. The wavenumbers contributing to this discrimination are summarized in Supplementary Table S1.

GA-SVM selected 12 spectral wavenumbers from the 1726 variables in the entire spectrum for class differentiation. These wavenumbers are 3602, 3476, 2644, 2404, 1850, 1502, 1454, 1156, 1094, 978, 702, and 640 cm⁻¹ (Supplementary Figure S1A). The GA-SVM discriminant function (DF) is depicted in Supplementary Figure S1B, with the tentative biomarker assignments for selected wavenumbers shown in Supplementary Table S2. The best fit for NIR data was achieved using PCA-LDA and GA-LDA, both with 100% accuracy. The assignments for PCA-LDA loadings are depicted in Supplementary Table S3, while the loadings and the scores are shown in Fig. S3. Although PC1 vs. PC2 do not show the separation between the classes, this is evident in both scores plots in Fig. S3E-F, when it is possible observe the separation between endometriosis and control groups, due to some compositional differences between the two classes. Figure 3 shows the discrimination plot and the 14 selected variables for NIR using GA-LDA.

The 14 spectral wavelengths responsible for class separation in NIR data using GA-LDA are 2332, 2289, 1498, 1479, 1393, 1370, 1361, 1304, 1223, 1187, 1096, 1073, 1037, and 901 nm. The assignments for these variables are summarized in Supplementary Table S4. To improve results, a low-level data fusion strategy was applied to IR data, with results depicted in Table 2.

Table 2 Quality parameters calculated for the test set using different supervised classification algorithms for the data fusion to distinguish women with endometriosis and the healthy control women.

Full size table

The results indicate good performance for the PCA-LDA, SPA-LDA, and GA-LDA models in discriminating data using fused NIR and MIR spectra. The combination of overtones and combination bands from NIR with fundamental vibrations found in ATR-FTIR provides comprehensive data characterization¹⁴. SPA-LDA selected 15 variables, while GA-LDA selected 11 variables (Supplementary Table S5), with tentative biomarker assignments shown in Table 3.

Table 3 Wavenumbers and wavelengths selected by SPA-LDA and GA-LDA applied to the ATR-FTIR and NIR fused data to discriminate between healthy controls vs. endometriosis samples.

Full size table

Additionally, 2D correlation spectroscopy was applied to study the relationship between different ATR-FTIR and NIR wavenumbers/wavelengths (Fig. 4). This correlation aids in identifying the molecular origin of NIR markers, as MIR bands are sharper and well-documented in specialized literature¹⁵. A correlation marker for endometriosis patients was found at ~ 1650 cm⁻¹ of the MIR spectra combined with the 1170 nm NIR band, being related to the amide I band. The marker at 1552 cm⁻¹, attributable to amide II, is consistent with the NIR band at 1100 nm, representing the second overtone of secondary amides^10,16. Other markers include wavenumbers at 2782 cm⁻¹, 1402 cm⁻¹, and 1416 cm⁻¹ MIR bands, attributed to N–H stretching, symmetric CH₃ bending modes of methyl groups of proteins, and N–H deformation/C–N stretching, respectively.

Discussion

There are few studies in specialized literature that search for new tools to predict the possible diagnosis of endometriosis using non-invasive methods. These methods include the analysis of blood biomarkers and genetic predispositions, demonstrating high sensitivity and specificity^17,18. Kokot et al. 2022, developed a partial least squares discriminant analysis (PLS-DA) model based on selected serum biochemical parameters, specific regions of the serum’s ATR-FTIR spectra, and combined data to diagnostic screening of advanced endometriosis, reaching an overall accuracy of 87.5%¹⁹. Therefore, a novel method using IR spectroscopy coupled with multivariate classification tools was applied to analyze blood plasma collected from women with endometriotic patterns and healthy individuals to discriminate between the two groups.

In this study, 75 blood plasma samples from women were analyzed using ATR-FTIR and NIR spectroscopy to stratify patients with endometriosis (41 out of 75) and healthy controls (34 out of 75). Various chemometric algorithms were tested for category discrimination, with the wavenumbers/wavelengths most responsible for segregating the different classes designated as potential diagnostic biomarkers.

ATR-FTIR spectroscopy is a powerful tool for analyzing different biological structures based on spectral analysis, proving to be highly useful in clinical studies²⁰. Previously, this tool has been successfully applied to detect gestational diabetes mellitus by our research group²¹ and endometrial cancer²². Herein, ATR-FTIR spectroscopy combined with chemometric algorithms was employed to detect patients with endometriosis. The best results were obtained using PCA-SVM, which achieved a classification accuracy of 77% and a sensitivity of 83%, followed by GA-SVM with an accuracy of 73%. Twelve spectral wavenumbers were responsible for class separation based on GA-SVM (Table S2). Bands related to Amide I at 1640 cm⁻¹, Amide II at 1548 cm⁻¹, and some features related to proteins at 1538 cm⁻¹ were identified as potential discriminant biomarkers.

In addition, NIR spectroscopy is a valuable tool for analyzing different types of diseases in biologically-derived samples. This spectroscopy has been successfully applied to detect HIV in pregnant women²³. PCA-LDA and GA-LDA were capable of discriminating patients with endometriosis from healthy women with 100% accuracy, 100% sensitivity, and 100% specificity. Furthermore, the NIR technique allows the use of portable instruments, which represents an operational advantage. However, NIR spectroscopy has limited chemical specificity due to the high degree of overlapping spectral features, making pure biomarker identification using only this technique infeasible²⁴. In this context, IR data fusion (ATR-FTIR and NIR) appears as an alternative strategy, combining the advantages of both IR spectroscopies in a single model. This approach increases the reliability of classification algorithms compared to using a single analytical technique²⁵, as demonstrated by Yang et al. 2021, that successfully applied an NIR-MIR spectral feature fusion based on PLS-DA to obtain a rapid and accurate diagnostic of Alzheimer’s disease, obtaining an accuracy of 100%²⁶.

Applying the data fusion strategy, the classification models using all studied feature extraction/selection algorithms combined with LDA and SVM achieved 100% accuracy, 100% sensitivity, and 100% specificity, confirming the feasibility of this approach for identifying endometriosis patients. The successful application of this method in a clinical environment could aid in the early diagnosis of endometriosis in a low-cost and non-invasive manner.

Furthermore, the multimodal spectroscopy or data fusion approach can be considered an advanced analytical tool that provides comprehensive and complementary information previously inaccessible, enabling the elucidation of intricate details. The application of 2D correlation spectroscopy allowed for the study of relationships between different MIR and NIR variables (wavenumbers/wavelengths). Markers found in the synchronous map were related to the Amide I and Amide II MIR bands, which matched the variables selected by the chemometric models. Finally, this approach demonstrates that NIR bands are correlated with specific ATR-FTIR bands, enhancing the interpretability of NIR spectra.

Conclusions

To the best of our knowledge, this is the first study demonstrating the efficacy of a new tool for fast and non-invasive diagnosis of endometriosis using blood plasma samples analyzed with IR spectroscopy combined with multivariate classification. The PCA-LDA, SPA-LDA, and GA-LDA models showed excellent performance when applied to the fused ATR-FTIR and NIR data, with high discriminant power between the studied classes. In this context, spectroscopy emerges as a potential method for studying endometriosis and could be applied in clinical practice in the future, either as a screening or diagnostic test.

Methods

The methodology used in this study is summarized in the workflow in Fig. 5.

Study design and population

We conducted a case–control study at the Januário Cicco Maternity School, affiliated with the Federal University of Rio Grande do Norte. A total of forty-one women (n = 41) with a clinical diagnosis of endometriosis were recruited, alongside thirty-four healthy women (n = 34). Subjects were excluded if they had autoimmune diseases, immunodeficiencies, chronic illnesses, or genetic syndromes linked to sex chromosomes. The characteristics of the included patients are presented in Table 4. All methods used in this study adhered to approved guidelines. The study received ethical approval from the ethics committee at the Januário Cicco Maternity School, Federal University of Rio Grande do Norte, under protocol number 44352921.6.0000.5292, and written informed consent was obtained from all subjects. All procedures were conducted in compliance with the Declaration of Helsinki.

Table 4 Demographic and clinical characteristics of included women with and without diagnosis of endometriosis.

Full size table

Sample collection and determination for analysis with ATR-FTIR and NIR

Venous blood samples were collected from all participants. The samples were centrifuged at 3600 rpm for 7 min to separate erythrocytes from the plasma. Subsequently, 100 µL aliquots of plasma were transferred to Eppendorf tubes and stored at −80 °C until ATR-FTIR and NIR analysis. This sample collection resulted in a total of 75 samples, with 41 from patients with endometriosis and 34 from the healthy control group.

ATR-FTIR spectroscopy

The blood plasma samples were left to thaw for 30–40 min at room temperature before spectrochemical measurement. 10 µL aliquots were used for analysis and all tests were obtained in triplicates using wet samples, with a new aliquot applied for each of the replicates. The spectral data were obtained using an attenuated total reflectance ATR-FTIR of PerkinElmer Spectrum Two FTIR type (PerkinElmer, USA) using a single-reflection diamond accessory. Measurements were performed using a KBr beamsplitter and a LiTaO₃ (lithium tantalate) detector. The spectra were recorded with a total of 32 scans. Next, they underwent Norton-Beer apodization and Fourier transformation using a zero-filling factor of 2 with 4 cm⁻¹ resolutions at the range between 4000 and 550 cm⁻¹. The ATR crystal was cleaned with ethyl alcohol (70% v/v) for each sample, before acquiring the experimental background and after reading each sample to minimize atmosphere variations over time.

NIR spectroscopy

NIR were obtained using an ARCoptix FT-NIR Rocker spectrophotometer (Arcoptix S.A., Switzerland). The tests were taken in transflectance mode with a resolution of 5 nm. The methodology was carried out by transferring 10 µL of the sample to an aluminum paper surface and using an optical fiber positioned onto each paper. The measurements were obtained in triplicate for each sample, with a new aliquot being prepared for each replicate, in the 900–2600 nm range.

Data analysis

The data analysis was performed in MATLAB R2014b environment version 8.4 (Math Works, Inc., USA), with PLS Toolbox version 7.9.3 (Eigenvector Research, Inc., USA) and the Classification toolbox (version 7.0) by Milano chemometrics and QSAR research group²⁷. The raw ATR-FTIR spectra were loaded and pre-processed by Savitzky-Golay (SG) smoothing (window of 7 points, 2^nd order polynomial fitting) and automatic weighted least squares (AWLS) baseline correction. NIR spectra were pre-processed with multiplicative signal correction (MSC) and SG smoothing (window of 15 points, 2nd order polynomial fitting). In addition, a low-level data fusion strategy was applied by combining the NIR and the ATR-FTIR information to achieve more accurate results²⁸. The ATR-FTIR/NIR fused data were normalized due to the different scales of the spectra. Both data were mean centered before the analysis. The samples were divided into training (70%) and test (30%) subsets by applying the Kennard-Stone (KS) sampling algorithm²⁹, using the training set for model construction and test set for its validation. For building the models, feature extraction/selection algorithms combined with discriminant analysis and support vector machine (SVM) were applied. The genetic algorithm (GA)³⁰ and the successive projections algorithm (SPA)³¹ were applied for feature selection, while principal components analysis (PCA)³² was used for feature extraction. The discriminant analysis was performed by linear discriminant analysis (LDA) and quadratic discriminant analysis (QDA). The algorithms PCA-LDA/QDA/SVM, SPA-LDA/QDA/SVM and GA-LDA/QDA/SVM were tested independently to achieve the best classification model.

In the PCA model, each PC is composed of scores (variance in the sample direction), used to assess similarities/dissimilarities between the samples; and the loadings (variance in the variables direction), used to show the weight of each variable (wavenumber or wavelength) towards the scores pattern. The PCA decomposition of a spectral dataset can be calculated by Eq. 1:

$$X=T{P}^{T}+E$$

(1)

where T is the score matrix; P is the loadings matrix; and, E is the residual matrix. The spectral variables selected by SPA and GA are based on minimizing the cost function G according to the Eq. 2:

$$G=\frac{1}{{N}_{V}}\sum_{n=1}^{{N}_{V}}{g}_{n}$$

(2)

where N_V represents the number of validation samples and g_n can be calculated according to Eq. 3:

$${g}_{n}=\frac{{r}^{2}({x}_{n},{M}_{I\left(n\right)})}{{{min}_{I(m)\ne I(n)}r}^{2}({x}_{n},{M}_{I\left(m\right)})}$$

(3)

where r²(x_n,m_I(n)) is the squared Mahalanobis distance between the object x_n and the center of its true category, while r²(x_n,m_I(m)) represents the squared Mahalanobis distance between the object x_n and the center of the closest wrong category m_I(m).

The LDA (L_ik) and QDA (Q_ik) discriminant analysis are calculated in a non-Bayesian form as described below ^33,34:

$${L}_{ik}=({x}_{i}-{\overline{x} }_{k}{)}^{T}{C}_{pooled}^{-1}({x}_{i}-{\overline{x} }_{k})$$

(4)

$${Q}_{ik}=({x}_{i}-{\overline{x} }_{k}{)}^{T}{C}_{k}^{-1}({x}_{i}-{\overline{x} }_{k})$$

(5)

where x_i is a vector containing the input variables for each sample; ${\overline{x} }_{k}$ is the mean spectrum of category k; and, C_pooled and C_k are the pooled covariance matrix and the variance–covariance matrix for category k, respectively.

SVM is a binary linear classifier using a nonlinear step that transforms the input sample space into a feature space by the application of a kernel function³⁵. The SVM can be calculated by the following Eq. 6³⁶:

$$f\left(x\right)=sing\left(\sum_{i=1}^{{N}_{SV}}{\alpha }_{i}{y}_{i}k\left({x}_{i},{z}_{j}\right)+b\right)$$

(6)

where N_SV represents the number of support vectors; α_i, y_i, and k(x_i,z_j) are the Langrage multiplier, the class membership (± 1) and the kernel function, respectively; and b is the bias parameter.

Model quality evaluation

The classification performance of each method was obtained based on the model accuracy, sensitivity and specificity. The accuracy (AC) represents the total number of samples correctly classified, while the specificity (SPEC) and the sensitivity (SENS) measure the proportion of negatives and positives that are correctly identified, respectively. The F-Score measures the overall classification performance considering imbalanced data, while the G-Score measures the overall classification performance not accounting for class sizes³⁷. These validation parameters are calculated by the following equations:

$$AC\left({\%}\right)=\left(\frac{TP+TN}{TP+FP+TN+FN}\right)\times 100$$

(7)

$$SPEC\left({\%}\right)=\left(\frac{TN}{TN+FP}\right)\times 100$$

(8)

$$SENS\left({\%}\right)=\left(\frac{TP}{TP+FN}\right)\times 100$$

(9)

$$F-Score\left(\%\right)=(\frac{2\times SENS\times SPEC}{SENS+SPEC})$$

(10)

$$G-Score\left(\%\right)=\sqrt{SENS\times SPEC}$$

(11)

where TP, TN, FP and FN represent the numbers of true positives, true negatives, false positives and false negatives, respectively.

Data availability

The datasets used and analysed during the current study available from the corresponding author on reasonable request.

References

Dastur, A. E. et al. John A Sampson and the origins of endometriosis. J Obstet Gynaecol India. 60(4), 299–300. https://doi.org/10.1007/s13224-010-0046-8 (2011).
Article PubMed Central MATH Google Scholar
Fuldeore, M. J. et al. Prevalence and symptomatic burden of diagnosed endometriosis in the United States: National estimates from a cross-sectional survey of 59,411 women. Gynecol Obstet Invest. 82(5), 453–461. https://doi.org/10.1159/000452660 (2017).
Article PubMed Google Scholar
Moore, J. et al. A systematic review of the accuracy of ultrasound in the diagnosis of endometriosis. Ultrasound Obstet Gynecol 20(6), 630–634. https://doi.org/10.1046/j.1469-0705.2002.00862.x (2002).
Article PubMed MATH CAS Google Scholar
Hsu, A. L. et al. Invasive and noninvasive methods for the diagnosis of endometriosis. Clin Obstet Gynecol. 53(2), 413–419. https://doi.org/10.1097/GRF.0b013e3181db7ce8 (2010).
Article PubMed PubMed Central MATH Google Scholar
Johnson, N. P. et al. World endometriosis society consensus on the classification of endometriosis. Hum Reprod. 32(2), 315–324. https://doi.org/10.1093/humrep/dew293 (2017).
Article PubMed MATH Google Scholar
Sindhuphak, R. et al. A new approach for the detection of cervical cancer in Thai women. Gynecol Oncol. 90(1), 10–14. https://doi.org/10.1016/s0090-8258(03)00196-3 (2003).
Article PubMed Google Scholar
Kluz-Barłowska, M. et al. FT-Raman and FTIR spectroscopy as a tools showing marker of platinum-resistant phenomena in women suffering from ovarian cancer. Sci Rep. 14(1), 11025. https://doi.org/10.1038/s41598-024-61775-z (2024).
Article ADS PubMed PubMed Central CAS Google Scholar
Taylor, S. E. et al. Infrared spectroscopy with multivariate analysis to interrogate endometrial tissue: A novel and objective diagnostic approach. Br J Cancer. 104(5), 790–797. https://doi.org/10.1038/sj.bjc.6606094 (2011).
Article PubMed PubMed Central MATH CAS Google Scholar
Martin, F. L. et al. Distinguishing cell types or populations based on the computational analysis of their infrared spectra. Nat Protoc. 5(11), 1748–1760. https://doi.org/10.1038/nprot.2010.133 (2010).
Article ADS PubMed MATH CAS Google Scholar
Movasaghi, Z. et al. Fourier transform infrared (FTIR) spectroscopy of biological tissues. Appl Spectrosc Rev. 43(2), 134–179. https://doi.org/10.1080/05704920701829043 (2008).
Article ADS CAS Google Scholar
Silva, L. G. et al. ATR-FTIR spectroscopy in blood plasma combined with multivariate analysis to detect HIV infection in pregnant women. Sci Rep. 10(1), 20156. https://doi.org/10.1038/s41598-020-77378-3 (2020).
Article PubMed PubMed Central CAS Google Scholar
Workman, J. J. et al. Interpretive spectroscopy for near infrared. Appl Spectrosc Rev. 31(3), 251–320. https://doi.org/10.1080/05704929608000571 (1996).
Article ADS MATH CAS Google Scholar
Yang, Z. et al. Challenges of Raman spectra to estimate carbonyl index of microplastics: A case study with environmental samples from sea surface. Mar. Pollut. Bull. 194, 115362. https://doi.org/10.1016/j.marpolbul.2023.115362 (2023).
Article PubMed MATH CAS Google Scholar
Câmara, A. B. F. et al. Multivariate assessment for predicting antioxidant activity from clove and pomegranate extracts by MCR-ALS and PLS models combined to IR spectroscopy. Food Chem. 384, 132321. https://doi.org/10.1016/j.foodchem.2022.132321 (2022).
Article PubMed CAS Google Scholar
Chakkumpulakkal, P. V. T. et al. A multimodal spectroscopic approach combining mid-infrared and near-infrared for discriminating gram-positive and gram-negative bacteria. Anal Chem. 96(46), 18392–18400. https://doi.org/10.1021/acs.analchem.4c03060 (2024).
Article CAS Google Scholar
Siesler, H. W. et al. Near-infrared spectroscopy: Principles, instruments, applications (ed. Siesler, H.W., Ozaki, Y., Kawata, S., Heise, H.M.) 1–568 (Wiley, 2008).
Letsiou, S. et al. Endometriosis is associated with aberrant metabolite profiles in plasma. Fertil Steril. 107(3), 699-706.e6. https://doi.org/10.1016/j.fertnstert.2016.12.032 (2017).
Article PubMed MATH CAS Google Scholar
Loy, S. L. et al. Discovery and validation of peritoneal endometriosis biomarkers in peritoneal fluid and serum. Reprod Biomed Online. 43(4), 727–737. https://doi.org/10.1016/j.rbmo.2021.07.002 (2021).
Article PubMed MATH CAS Google Scholar
Kokot, et al. ATR-IR spectroscopy application to diagnostic screening of advanced endometriosis. Oxid. Med. Cell. Longev. 2022(1), 4777434. https://doi.org/10.1155/2022/4777434 (2022).
Article PubMed PubMed Central MATH CAS Google Scholar
Kelly, J. G. et al. Biospectroscopy to metabolically profile biomolecular structure: A multistage approach linking computational analysis with biomarkers. J Proteome Res. 10(4), 1437–1448. https://doi.org/10.1021/pr101067u (2011).
Article PubMed MATH CAS Google Scholar
Bernardes-Oliveira, E. et al. Spectrochemical differentiation in gestational diabetes mellitus based on attenuated total reflection Fourier-transform infrared (ATR-FTIR) spectroscopy and multivariate analysis. Sci Rep. 10(1), 19259. https://doi.org/10.1038/s41598-020-75539-y (2020).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Paraskevaidi, M. et al. Detecting endometrial cancer by blood spectroscopy: A diagnostic cross-sectional study. Cancers (Basel). 12(5), 1256. https://doi.org/10.3390/cancers12051256 (2020).
Article PubMed PubMed Central MATH CAS Google Scholar
Freitas, D. L. D. et al. Near-infrared spectroscopy of blood plasma with chemometrics towards HIV discrimination during pregnancy. Sci Rep. 11(1), 22609. https://doi.org/10.1038/s41598-021-02105-5 (2021).
Article ADS MathSciNet PubMed PubMed Central CAS Google Scholar
Assis, C. et al. Combining mid infrared spectroscopy and paper spray mass spectrometry in a data fusion model to predict the composition of coffee blends. Food Chem. 281, 71–77. https://doi.org/10.1016/j.foodchem.2018.12.044 (2019).
Article PubMed MATH CAS Google Scholar
Veettil, T. C. P. et al. A combined near-infrared and mid-infrared spectroscopic approach for the detection and quantification of glycine in human serum. Sensors. 22(12), 4528. https://doi.org/10.3390/s22124528 (2022).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Yang, et al. Early rapid diagnosis of Alzheimer’s disease based on fusion of near- and mid-infrared spectral features combined with PLS-DA. Optik 241, 166485. https://doi.org/10.1016/j.ijleo.2021.166485 (2021).
Article CAS Google Scholar
Ballabio, D. et al. Classification tools in chemistry. Part 1: linear models. PLS-DA. Anal Methods. 5(16), 3790–3798. https://doi.org/10.1039/C3AY40582F (2013).
Article MATH CAS Google Scholar
Yu, S. et al. Qualitative and quantitative assessment of flavor quality of Chinese soybean paste using multiple sensor technologies combined with chemometrics and a data fusion strategy. Food Chem. 405, 134859. https://doi.org/10.1016/j.foodchem.2022.134859 (2023).
Article PubMed MATH CAS Google Scholar
Kennard, R. et al. Computer aided design of experiments. Technometrics. 11(1), 137–148. https://doi.org/10.1080/00401706.1969.10490666 (1969).
Article MATH Google Scholar
McCall, J. Genetic algorithms for modelling and optimisation. J Comput Appl Math. 184(1), 205–222. https://doi.org/10.1016/j.cam.2004.07.034 (2005).
Article ADS MathSciNet MATH Google Scholar
Pontes, M. J. C. et al. The successive projections algorithm for spectral variable selection in classification problems. Chemom Intell Lab Syst. 78(1–2), 11–18. https://doi.org/10.1016/j.chemolab.2004.12.001 (2005).
Article MATH CAS Google Scholar
Bro, R. et al. Principal component analysis. Anal. Methods. 6(9), 2812–2831. https://doi.org/10.1039/C3AY41907J (2014).
Article MATH CAS Google Scholar
Morais, C. L. et al. Principal component analysis with linear and quadratic discriminant analysis for identification of cancer samples based on mass spectrometry. J. Braz. Chem. Soc. 29(3), 472–481. https://doi.org/10.21577/0103-5053.20170159 (2018).
Article MATH CAS Google Scholar
Sarah, J. et al. Comparison of performance of five common classifiers represented as boundary methods: Euclidean Distance to Centroids, Linear Discriminant Analysis, Quadratic Discriminant Analysis, Learning Vector Quantization and Support Vector Machines, as dependent on data structure. Chemom Intell Lab Syst. 95(1), 1–17. https://doi.org/10.1016/j.chemolab.2008.07.010 (2009).
Article MATH CAS Google Scholar
Cortes, C. et al. Support-vector networks. Mach Learn. 20, 273–297. https://doi.org/10.1007/BF00994018 (1995).
Article MATH Google Scholar
Morais, C. L. et al. Variable selection with a support vector machine for discriminating Cryptococcus fungal species based on ATR-FTIR spectroscopy. Anal Methods. 9(20), 2964–2970. https://doi.org/10.1039/C7AY00428A (2017).
Article MATH CAS Google Scholar
Morais, C. L. et al. Comparing unfolded and two-dimensional discriminant analysis and support vector machines for classification of EEM data. Chemom Intell. Lab. Syst. 170, 1–12. https://doi.org/10.1016/j.chemolab.2017.09.001 (2017).
Article MATH CAS Google Scholar

Download references

Acknowledgements

The authors acknowledge the support provided by the Post-Graduate Chemistry Program (PPGQ/UFRN) and the Laboratório de Pesquisa em Petróleo—LAPET (IQ/UFRN). A. B. F. Câmara would like to thank the Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq/FAPERN, Brazil), code 153445/2024-6 for the financial support.

Author information

Amaxsell Thiago Barros de Souza and Anne Beatriz Figueira Câmara contributed equally to this work.

Authors and Affiliations

Graduate Program in Science Applied to Women’s Health, Federal University of Rio Grande Do Norte, Natal, RN, 59012-310, Brazil
Amaxsell Thiago Barros de Souza & Janaina Cristiana de Oliveira Crispim
Biological Chemistry and Chemometrics, Institute of Chemistry, Federal University of Rio Grande Do Norte, Natal, RN, 5072-970, Brazil
Anne Beatriz Figueira Câmara & Kássio Michell Gomes de Lima
Post-Graduate Program in Technological Development and Innovation in Medicines, Federal University of Rio Grande Do Norte, Natal, RN, 59072-97, Brazil
Cristina Maria de Araújo Medeiros Santos
Center for Education, Science and Technology of the Inhamuns Region, State University of Ceará, Tauá, CE, 63660-000, Brazil
Camilo de Lelis Medeiros de Morais
Department of Clinical and Toxicological Analysis, Federal University of Rio Grande Do Norte, Natal, RN, 59072-970, Brazil
Janaina Cristiana de Oliveira Crispim

Authors

Amaxsell Thiago Barros de Souza
View author publications
Search author on:PubMed Google Scholar
Anne Beatriz Figueira Câmara
View author publications
Search author on:PubMed Google Scholar
Cristina Maria de Araújo Medeiros Santos
View author publications
Search author on:PubMed Google Scholar
Camilo de Lelis Medeiros de Morais
View author publications
Search author on:PubMed Google Scholar
Janaina Cristiana de Oliveira Crispim
View author publications
Search author on:PubMed Google Scholar
Kássio Michell Gomes de Lima
View author publications
Search author on:PubMed Google Scholar

Contributions

A.T.B.S., A.B.F.C. and C.M.A.M.S. designed the experiments. A.T.B.S. contributed to the collection of biological samples. K.M.G.L. and J.C.O.C. analyzed the data and contributed with reagents, materials, and/or analysis tools. A.T.B.S. and A.B.F.C. contributed to manuscript preparation. K.M.G.L., J.C.O.C. and C.L.M.M. refined the manuscript for publication. A.B.F.C. and K.M.G.L., data analysis. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Kássio Michell Gomes de Lima.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

de Souza, A.T.B., Câmara, A.B.F., de Araújo Medeiros Santos, C.M. et al. Spectrochemical differentiation in endometriosis based on infrared spectroscopy advanced data fusion and multivariate analysis. Sci Rep 15, 5071 (2025). https://doi.org/10.1038/s41598-025-89504-0

Download citation

Received: 27 November 2024
Accepted: 05 February 2025
Published: 11 February 2025
Version of record: 11 February 2025
DOI: https://doi.org/10.1038/s41598-025-89504-0