Introduction

Fourier Transform Infrared (FTIR) spectroscopy is a technique used for chemical characterization of samples and for detection of chemical differences between samples1. FTIR is a non-invasive method, which means that samples are not destroyed. Moreover, FTIR is a non-expensive and very fast analysis method for samples in all physical states2, however samples with water should be dried. The above-mentioned features prove that FTIR is a practically universal method that allows analysis of samples of various origins. Consequently, plants3, nanoparticles4, oxides5, oils6 and others can be assayed using FTIR spectroscopy. Biological materials such as cells7, tissues8, urine9, serum10,11 and plasma11 can be analyzed using FTIR as well. Thus, FTIR spectroscopy is also used to determine chemical changes occurring in biological materials caused by different diseases, which means that FTIR can also be used as a diagnosis method in medicine12.

For instance, potential of serum and plasma in detecting diseases was investigated using FTIR. Mennickent et al., developed a machine learning (ML) model based on the FTIR data obtained after serum measurement, to predict gestational diabetes mellitus13. In other work, FTIR range (1250 cm− 1 – 1306 cm− 1) was found the most important for differentiating patients with and without breast cancer14. In the case of liver cancer, it was possible to differentiate serum that was collected from sick and control patients using FTIR spectroscopy in the range of 2800 cm− 1 to 2900 cm− 115 .Wang et al., showed that the ratio between peaks at 1080 cm− 1 and 1170 cm− 1 ratio might be potentially useful for distinguishing lung cancer patients’ serum from healthy persons’ serum by FTIR16. In other work, PLS-DA results for first derivative FTIR spectra in nucleic acids band region (1250 cm− 1 – 1000 cm− 1) yielded 80% sensitivity, 91.89% specificity, and 87.10% accuracy in detection of lung cancer from serum by FTIR17.

For diagnostics, the use of body fluids such as blood, serum or plasma would be less invasive, repeatable and easier to carry out in comparison to the use of other tissues18. Currently, only tissue biopsy followed by histopathological examination confirms the presence of solid tumors. Thus, additional information obtained from body fluid spectroscopic analysis might prove useful for cancer patients diagnosis, predicting treatment success or failure that are of great importance.

In the case of body fluids, the samples free from morphotic elements (cells), that is, plasma or serum19 seem to be good materials for FTIR analysis and storage in frozen state. This chiefly due to inherent advantages of plasma and serum when compared to whole blood. One disadvantage of the whole blood used for FTIR is related to presence of hemoglobin as the main component of erythrocytes, which masks other molecules20. Based on that aspect, and taking into account the fact that the majority of biochemical and immunological diagnostic tests are performed from serum (blood count, but also an erythrocyte sedimentation rate test, cholesterol measurement and glucose determination)21, While plasma, in turn, is recommended for the analysis of lipoproteins or apolipoproteins22, this enables correlations with spectral data to show which bands could be described to which kind of biochemical parameters. Since both, serum and plasma are used in medical diagnostics, FTIR analysis could be an additional indicator for accurate diagnosis or monitoring for particular disease. However, until now it has not been indicated which biological fluids (serum or plasma) will be the best for this type of analysis so far.

Most lung cancers develop latent and early symptoms that are usually non-specific. Prolonged cough, hoarseness, chest pain, shortness of breath and increased fatigue can easily be attributed to other diseases, especially chronic ones23. The first step in diagnosing lung cancer is an X-ray examination. If the results show cancerous changes in the mediastinum, atelectasis or emphysema, a computed tomography (CT) scan is performed24. In oncological diagnostics, the aim is to perform histopathological and molecular tests as quickly as possible. Based on the histopathological and molecular results of the test, the treatment course is determined25. Unfortunately, so far there have been no serum or plasma markers specific to lung cancer that could help in diagnosis. Recently, it has been shown that FTIR spectroscopy can indicate the presence or absence of lung cancer26,27. However, it was not shown which body fluids (serum, plasma) give the smallest detection error. Therefore, in this study, we performed comparison of FTIR detection in serum and plasma collected from lung cancer patients and individuals without lung cancer symptoms followed by machine learning analyses to verify which biofluid was more efficient in yielding greater accurate results in detection of lung cancer. The PCA, SVM, ROC, and LR analyses were used to determine possibility of differentiating materials (serum and plasma) collected from individuals without lung cancer symptoms (controls) and lung cancer patients groups in terms of three diagnostic parameters: sensitivity, specificity, and accuracy. PCA was used to show extract variables that explained greatest variance in the measured spectra - the so called principal components (PCs). Using SVM, accuracy was calculated, while ROC showed sensitivity and specificity of the FTIR technique in detection of lung cancer from serum and plasma. The LR algorithm was added to determine percentages of cases correctly classified.

Results

Fourier transform infrared (FTIR) spectra of serum and plasma were analyzed to verify which body fluids could be used to diagnose presence of lung cancer with high levels of accuracy. In the present study, 34 samples of serum as well as plasma were collected from lung cancer patients, while 18 samples of serum as well as plasma were collected from patients without lung cancer symptoms. Then, a 4 µl of each sample was deposited on the CaF2 slides and air-dried for approximately ten minutes, later followed by FTIR measutrements. It can be seen (Fig. 1) that, the spectra profiles obtained from serum and plasma were similar in peak shapes and positions. In both studied body fluids, functional groups building phospholipids, amides and lipids structures were visible. Phospholipids vibrations were located around 1070 cm− 1 and 1260 cm− 1 and were assigned to symmetric and asymmetric PO2− vibrations26,27. Proteins and lipids functional groups were visible as a CH3 as well as C = O vibrations from COO around 1400 cm− 1 and 1450 cm− 128. Vibrations characteristic for proteins only, referenced to as amides, were noticed around 1310 cm− 1, 1550 cm− 1, 1650 cm− 1 and 3292 cm− 1 wavenumbers29,30,31. These wavenumbers originated from amide III, amide II, amide I, and amide A, respectively. Finally, lipids structures were visible as a C = O vibrations placed at 1750 cm− 1 and CH2, CH3 symmetric and asymmetric stretching vibrations were visible in the range from 2800 cm− 1 to 3000 cm− 132.

In both analyzed fluids (plasma, Fig. 1a and serum, Fig. 1b, ), higher values of absorbance for all described functional groups were noticed in serum and plasma collected from lung cancer patients’ samples, which suggested that using only spectra, it was not possible to show which body fluid would be more precise in detection of lung cancer using FTIR. Therefore, we performed multivariate of the measured spectra in order to mine subtle spectral differences invisible to the human eye which could enable lung cancer detection with high level of accuracy.

Fig. 1
figure 1

FTIR spectrum of (a) plasma and (b) serum collected from individuals without lung cancer symptoms (controls) and lung cancer subjects with marked regions corresponding to phospholipids vibrations (red color), amides (yellow color), and lipids (green color).

Principal Component Analysis of FTIR spectra in 800 cm− 1 to 1800 cm− 1 spectral region demonstrated possibility of differentiating the plasma samples collected from individuals without lung cancer symptoms (controls) and lung cancer patients (Fig. 2a1). It was visible that samples collected from patients without cancer symptoms were chiefly located in the negative value of principal 1 (PC1). For plasma collected from cancer patients, samples were located in more than 70% of cases in the positive value of PC1 as positive and negative of PC2. Consequently, PC1 played main role in plasma differentiation. In the Fig. 2a3 loading plots for PC1 and PC2 were presented. Examination of the loading plots (Fig. 2a3) suggests that all chemical compounds building plasma played significant role in differentiation patients with and without lung cancer. It was visible in marked in Fig. 2a3 peaks. On the other hand, PCA of FTIR spectra in 800 cm− 1 to 1800 cm− 1 spectral region showed that it was also possible to differentiate serum collected from lung cancer patients and patients without cancer symptoms (Fig. 2b1). It was observed that samples collected from patients without cancer symptoms were majorly located in negative value of PC1, while more than 80% of samples collected from cancer patients were on positive value of PC1, suggesting that PC1 was useful in discriminating cancer from non-cancer patients. According to loading plots for serum analysis provided in Fig. 2b3, it can be seen that all vibrations characteristic for serum were located in positive part of loading plots, which also suggested important role in differentiation both analyzed groups of patients. For second analyzed FTIR region (2800 cm− 1 – 3000 cm− 1), a region mainly assigned to lipids, it was observed that plasma samples collected from cancer patients and patients without cancer symptoms were significantly overlapped (Fig. 2a2), meaning the region was not useful for lung cancer detection. The loading plots are provided in Fig. 2a4. The same observation was made for serum samples (Fig. 2b2), with loading respective plots presented in Fig. 2b4. In the context of this study, it was clear that only 800 cm− 1 to 1800 cm− 1 spectral region could be used for differentiation of plasma and serum collected from cancer patients and patients without cancer symptoms.

Fig. 2
figure 2

PCA scatter plots for FTIR spectra collected from (a1) plasma, (b1) serum in 800–1800 cm− 1 region; (a2) plasma, (b2) serum in 2800–3000 cm− 1 region. Respective loading vectors are provided in (a3), (b3), (a4), and (b4). Each black, blue, and pink symbol in a1, a2, b1, and b2 represents one spectrum.

As depicted in Fig. 2a1 and b1, it was possible to differentiate serum as well as plasma collected from individuals without cancer symptoms and lung cancer patients in 800 cm-1 − 1800 cm− 1 spectral region. To obtain information about accuracy of the obtained results, SVM was used, while ROC was performed to show sensitivity and specificity. Logistic regression was done to presented percentages of cases correctly classified. For LR analysis, only sensitivity was calculated. It can be seen that analysis using SVM yielded 111 true positive and 51 false positive cases for plasma and serum in the range between 800 cm− 1 and 1800 cm− 1 (Fig. 3a1,a2) and for the serum in the range from 2800 cm− 1 to 3000 cm− 1 (Fig. 3b2). A However, for plasma analyzed in fingerprint FTIR range (800 cm− 1 and 1800 cm− 1), analysis using SVM yielded 111 true positive, 40 false positive, and 11 true negative cases (Fig. 3b1).

Fig. 3
figure 3

SVM analysis results of FTIR spectra in 800 cm− 1 to 1800 cm− 1 ((a1) plasma; (a2) -serum) and 2800 cm− 1 to 3000 cm− 1 ((a2) - plasma; (b2) - serum).

As demonstrated in Table 1 the same values of accuracy calculated using SVM (69%) in range from 2800 cm−1 to 3000 cm−1 were obtained for serum and plasma, as well as for plasma in range 800 cm−1–1800 cm−1. For serum in FTIR range from 800 cm1 to 1800 cm-1 75% of accuracy was noticed. For this material (serum) in fingerprint FTIR range, also the highest values of precision (0.71) and F1 (0.85) were visible.

Table 1 Accuracy, F1 score and precision metrics of SVM analysis on serum and plasma spectra.

The receiver operating characteristic (ROC) results presented in Fig. 4 showed the highest value of AUC in serum analyzed in the range between 800 cm− 1 and 1800 cm− 1 (0.98), Fig. 4a. For plasma in the same FTIR range the AUC was 0.96. Smaller values of AUC for serum as well as plasma in the lipids FTIR range (2800 cm− 1 -3000 cm− 1 ) were observed (0.74 for plasma and 0.73 for serum), Fig. 4b.

Fig. 4
figure 4

ROC curves of FTIR based on the Y values of the spectral range (a) 800 cm− 1 − 1800 cm− 1, and (b) 2800 cm− 1 – 3000 cm− 1 where blue and red lines originate from plasma and serum samples, respectively.

It can be seen that sensitivity, specificity and AUC values presented in Table 2 showed a higher sensitivity for the serum, as well as for the plasma in the fingerprint range, in comparison with the lipids range. Similar values of these parameters in first (800 cm− 1 – 1800 cm− 1) and second (2800 cm− 1 − 3000 cm− 1) analyzed ranges were noticed for both serum and plasma. For plasma, sensitivity of 95.5% in the range from 800 cm− 1 to 1800 cm− 1 was observed, while for serum this value was 93.7%. In the case of lipids range, sensitivity of 47.7% was observed in plasma, while a value of 50.5% was observed in serum. A 100% of specificity was noticed for plasma in the lipids range (2800 cm− 1 to 3000 cm− 1) and for serum in fingerprint range (800 cm− 1 – 1800 cm− 1). On the other hand, 94.1% the same parameter was noticed for plasma and serum in 800–1800 cm− 1, and 2800–3000 cm− 1 regions, respectively.

Table 2 Sensitivity, specificity and AUC of ROC curve.

Logistic regression model was performed for both sample materials using the same FTIR ranges as in the cases of SVM and ROC. The resultant LR model is shown in Fig. 5 and its diagnostic metrics are provided in Table 3. The Area under the curve of this model showed the highest value (0.98) for serum in the fingerprint range, while the smallest (around 0.75) for serum and plasma in the lipids range. For plasma in the range between 800 cm− 1 and 1800 cm− 1 AUC was 0.96, Fig. 5a1. Moreover, based on plasma samples, the LR model yielded correctly classified percentage cases of 95.1% and 67.9% in fingerprint (Fig. 5a1) and lipids (Fig. 5a2) FTIR spectral ranges, respectively. For serum in the same two ranges, values of cases correctly classified were 94.4%, and 69.8%, respectively, Fig. 5b1, 5b2, respectively. From the obtained results, we noticed accuracy values were greater by 0.7% when plasma samples’ spectra were analyzed in fingerprinting region, while similar diagnostic values were greater by 1.9% when serum samples’ spectra were analyzed in lipid region, when compared to plasma samples.

Fig. 5
figure 5

Logistic regression curve obtained for plasma (a) and serum (b) in FTIR range 800 cm− 1 and 1800 cm− 1 (1), and 2800 cm− 1 – 3000 cm− 1 (2).

Table 3 Sensitivity and AUC values of logistical regressional analysis on serum and plasma spectra.

Discussion

The diagnosis and successful treatment of lung cancer is a challenge for clinicians, - primarily – because most patients are diagnosed at an advanced stage, and have an aggressive natural course with frequent occurrence of metastases and treatment resistance33. A chance to achieve a better patient’s outcome should be sought in more effective primary prevention, early detection and efficient treatment34. Currently, unfortunately, lung cancer is often detected in advanced stages (IIIB or IV), which limits the possibility of surgical intervention. In most cases, patients receive radiotherapy and / or chemotherapy which are still not satisfactory in terms of the 5-year survival rate despite the introduction of personalized therapies35. Therefore, new diagnostic methods are needed, that could ideally be performed as part of the routine biochemical blood work. For this purpose, this study aimed to analyze which body fluid (serum or plasma) could be a better candidate material for the diagnostics of lung cancer using FTIR spectroscopy.

In our experimental settings, higher absorbances of functional groups of phospholipids, lipids and proteins functional groups were observed in both serum and plasma collected from lung cancer patients when compared to the control group. Similar results for the absorbances of amide I, amide II, lipids range and wavenumbers between 100 cm− 1 and 1250 cm− 1 were observed by Yang et al.17. When absorbance values of amide II and amide I absorbances in FTIR spectra of serum were compared with absorbances of the same bands in plasma spectra, a higher absorbance value around 0.1 were visible in plasma (Fig. 1). These lower values of absorbances observed in the bands originating from protein content in serum samples could be due to the anticoagulant effect, thereby leading to the osmotic changes resulting from the redistribution of fluid between blood cells and plasma. Differences in amides vibrations .between plasma and serum could be explained by different amount of proteins present in both fluids - Ayache et al.36. In one study, it was observed that higher plasma fibrinogen and osteopontin, compared to serum, could affect the proteins fraction (mainly amides bonds) results37.

Regarding multivariate analysis, PCA analysis on FTIR spectra showed that plasma and serum differentiated lung cancer patients from individuals without lung cancer symptoms exclusively in fingerprinting (from 800 cm− 1 to 1800 cm− 1 region (Fig. 2). Consequently, ROC and LR algorithms (Tables 2 and 3) showed higher values of accuracy, specificity and sensitivity as well as AUC, for this range in comparison with the FTIR range from 2800 cm− 1 to 3000 cm− 1. Another study, where FTIR spectroscopy was used to detect serum or plasma collected from patients with lung cancer, also showed that lipids range showed lower accuracy17. It was visible also in other works, where ratio between respectively peaks were used as a value allowed to differentiate serum collected from lung cancer patients and control people16,17. Wang et al.16 showed that ratio between A1080/A1170 ratio might be potentially useful for distinguishing serum collected from lung cancer patients and healthy persons once. They also showed that, ratios of α-helix/antiparallel β-sheet were lower for lung cancer patients’ serum than those for healthy persons’ serum16. In other report, values of sensitivity, specificity and accuray of the FTIR fingerprint range were higher17 than in presented paper. It could have been due to different FTIR range used for analysis (1250 –1000 cm− 1) as well as different type of analysis (partial least squares-discriminant analysis -PLS-DA) adopted in the study17. In both these works signal only from fingerprint FTIR range was analyzed38,39. While, in this paper values of sensitivity, specificity and accuray were shown not only for fingerprint range, but also for lipids range, and, importantly, differences in the lung cancer detection from serum and plasma, where both fluid were obtained from the same consenting participants.

In the case of plasma, a larger number of samples were outliers in the FTIR spectra, probably because of larger quantities and types of biochemical components (fibrinogen, citrate, higher probability of erythrocytes and plates) present in plasma when compared to serum11. This may be the reason for greater scattering of plasma samples in the PCA plot compared to serum samples. However, from Tables 2 and 3 it was visible that values of accuracy, specificity and sensitivity as well as AUC for detection lung cancer were very similar for serum and for plasma. Differences in the values of mentioned parameters were noticed only when two analyzed region were compared to each other (for fingerprint FTIR range values of calculated parameters were higher than for lipids FTIR range).

For FTIR spectroscopy to have great potential as a tool for medical diagnostics, it was very important that when potential of FTIR was investigated, homogeneous groups of consenting participants (in terms of age, sex, absence or presence of the same comorbidities) be examined. However, it is known to be difficult to distinguish such groups in the natural environment. Therefore, it is in our view that the obtained results were certainly influenced by the personal characteristics of the patients who were examined. Moreover, including a larger number of serum and plasma samples in similar related studies would certainly yield weighty conclusive findings amenable to lung cancer detection. However, in the case of the presented research, it was important that both serum and plasma came from the same patients. Therefore, it would be necessary to collect a larger number of samples from a more homogeneous group of patients to clearly determine which fluid (serum or plasma) and to what extent it can be used to distinguish patients with lung cancer from individuals without lung cancer symptoms using FTIR.

Conclusions

In this study, serum and plasma collected from lung cancer patients and patients without cancer symptoms were measured by FTIR spectrometer and next two FTIR ranges (800 cm− 1 – 1800 cm− 1 and 2800 cm− 1 – 3000 cm− 1) were analyzed by chemometrics analyses to show (1) which body fluids and (2) which range give better differentiation between both studied groups. Support vector machine showed similar values of precision (around 70%) and F1 score (from 81 to 85%) for serum and plasma lipids range (2800 cm− 1 – 3000 cm− 1) as well as for fingerprint range (800 cm− 1 – 1800 cm− 1). Moreover, SVM also showed similar value of accuracy (from 69 to 75%) for analyzed fluids and ranges. From LR results it was noticed higher value of sensitivity for fingerprint (~ 95%) and AUC (~ 0.98) than for lipids one (~ 70% and 0.75, respectively). Consequently, from LR it was visible that better differentiation between lung cancer patients and patients without cancer symptoms was obtained for range from 800 cm− 1 to 1800 cm− 1. However, also LR analysis was not selected which body fluids were better for detection lung cancer.

Materials and methods

Materials

In this study, serum and plasma collected from lung cancer patients ( diseased group) and individuals without cancer symptoms (control group), treated in the Holycross Cancer Centre (Świętokrzyskie province – Poland) were analyzed. All procedures and methods used in this study were conducted under the Declaration of Helsinki and approved by the Bioethics Committee of the Regional Chamber of Physicians in Kielce (No. KB7/2012 approved on 18th December 2012). Consequently, all methods were performed in accordance with the relevant guidelines and regulations. Moreover, informed consent was obtained from all subjects. 34 samples of serum and plasma from lung cancer group (diseased group) and 18 samples originating from control group were collected. Serum and plasma was obtained from matching 34 of patients (25 men) aged 52.36 ± 9.12. All cancer patients were diagnosed with the same histological type of lung cancer (non-small cell lung cancer), where 8.8%, 41.2%, and 50.0% of lung cancer patients were in I or II, III, and IV clinical stages, respectively. Samples were collected from these patients prior to any oncological treatment, including chemotherapy and immunotherapy. Once the whole blood was collected, serum and plasma were isolated and stored frozen at -80°C until the day of measurement with the FTIR spectrometer.

Methods

FTIR measurements and analysis methodology

In this study, Thermo Nicolet 6700 FT-IR spectrometer was used for obtaining FTIR spectra of serum and plasma samples. In this experiment, deuterated triglycine sulfate (DTGS) detector and Attenuated Total Reflection (ATR) technique with diamond crystal was used. Furthermore, all spectra were measured using 32 scans, with a range of 4000–400 cm− 1 and a spectral resolution of 4 cm− 1. Before measurement, 4µL of serum or plasma samples were placed on the calcium fluoride (CaF2) slides to dry for around 10 min. During that time, background spectra was collected. Next, measurement of serum and plasma was done with the ATR crystal having been cleaned with 70% ethanol solution after each sample. The number were correlated with the number of samples collected from each group. In this study 34 serum and plasma samples were obtained from lung cancer patients and 18 similar samples were collected from individuals without lung cancer symptoms. FTIR spectra were measured thrice from each sample. Therefore, for the lung cancer patients group 101 spectra of serum and 101 serum of plasma were obtained, while for the individuals without lung cancer symptoms – 54 spectra of serum as well as 54 spectra of plasma were collected. Consequently, 155 spectra of serum and the same number of plasma spectra were analyzed in this study. Before further analysis, baseline correction using the Rubberband method with 64 points was performed, followed by min.-max normalization. The obtained spectra were analyzed using principal component analysis (PCA), receiver operating characteristic (ROC), support vector machine (SVM) algorithm, and logistic regression (LR) model. Principal component analysis was used to reduce the dimensionality of FTIR data while preserving crucial information and to obtain new, uncorrelated variables called principal components, which would primarily reveal possibility of differentiating serum or plasma collected from lung cancer patients and individuals without lung cancer symptoms. This dimensionally reduced spectral data were fed, separately, to SVM, ROC, and LR models. SVM was used to calculate accuracy, LR analyses showed percentage of cases correctly classified, while ROC yielded sensitivity and specificity values. For clarity, ROC was used as a separate metric method for measuring SVM and LR models performance. For each analysis two FTIR spectral ranges were investigated: 800 cm− 1 − 1800 cm− 1 and 2800 cm− 1 – 3000 cm− 1. In the present study, 2076 and 417 wavenumbers of measured spectra were used in the first (800–1800 cm− 1) and second (2800–3000 cm− 1) spectral ranges, respectively. PCA analysis was performed using Past 4.0 software (Oyvind Hammer, March 2022). In ROC and LR models, samples collected from control subjects were marked as 0, and the ones from diseased (cancer patients) subjects as 1. For ROC, the sensitivity and specificity were calculated, while for LR, it was the percentage of cases correctly classified. ROC and LR were done using MedCalc software (11.3.0.0 version, https://www.medcalc.org/) at 95% confidence level. In the SVM model, control subjects were assigned a value of 1, while cancer patients were assigned a value of -1. Moreover, in SVM analysis, a linear kernel function was used, done using Kosheigo software (Copyright©2023 Koshegio, https://www.koshegio.com/support-vector-machine-calculator).