Abstract
New bronchoscopy techniques like radial probe endobronchial ultrasound have been developed for real-time sampling characterization, but their use is still limited. This study aims to use classification algorithms with minimally invasive electrical impedance spectroscopy to improve neoplastic lung tissue identification during biopsies. Decision Tree, Support Vector Machines (SVM), Ensemble Method, K-Nearest Neighbors, Naïve Bayes and Discriminant Analysis were applied using mean averaged bioimpedance modulus and phase angle spectra from lung tissue across 15 frequencies (15–307 kHz). Mann-Whitney U test assessed statistical significance between neoplasm and other tissues. Grid search analysis was conducted to determine the optimal hyperparameter configuration for each model, employing a 5-fold cross-validation approach. Model performance was evaluated using Receiver Operating Characteristic curves, with the Area Under Curve (AUC), precision, recall, and F1-score calculated. All the frequencies used to train and test the algorithms obtained high significant differences between neoplasm and the other types of tissues (P < 0.001). All the algorithms implemented obtained an accuracy, AUC and F1-score above the 95% except for Naïve Bayes. Decision Tree, Discriminant Analysis and SVM algorithms are suitable for the implementation of a new low-cost guidance method during bronchoscopy.
Similar content being viewed by others
Introduction
Diagnosing lung diseases often requires tissue characterization, necessitating lung samples for an accurate final diagnosis. Current imaging methods can guide the diagnosis but are limited as they do not provide real-time guidance for collecting lung tissue samples. The studies found in literature1,2 lack in consistency regarding the sensitivity of the biopsy in bronchoscopy procedures for lung neoplasm tissue sampling. The studies found state that the sensitivity for this procedure is between 60% and 88%. In addition, another factor for sensitivity variation would be the biopsy method implemented. Recently, new bronchoscopy techniques like radial probe endobronchial ultrasound and electromagnetic navigation bronchoscopy have been developed for real-time sampling characterization. However, their high cost limits availability in Interventional Pulmonology units, and their use remains suboptimal3,4. Table 1 summarizes the pros and cons of the existing methods for tissue sampling.
To enhance current diagnostic methods, affordable real-time guidance techniques are needed for tissue sampling. Combining electrical impedance spectroscopy (EIS) with artificial intelligence algorithms could enable electronic biopsies, allowing classification and identification of neoplasm tissue. This would ensure accurate tissue sampling and reduce the occurrence of negative and unnecessary biopsies.
Bioimpedance (Z) is the tissue’s opposition to electrical current flow, which varies with frequency when alternating current is applied. This measurement across a wide range of frequencies is known as electrical impedance spectroscopy (EIS). Bioimpedance consists of resistance (R), related to extracellular and intracellular mediums, and reactance (Xc), related to cell membrane capacitance. From these, two parameters are derived: the impedance modulus (|Z|) defined as \(\:\sqrt{{R}^{2}+X{c}^{2}}\) and the phase angle (PA) defined as \(\:{\text{tan}}^{-1}(\frac{Xc}{R})\). At low frequencies, current flows through the extracellular medium only, while at high frequencies, it penetrates cell membranes, flowing through both intra- and extracellular mediums5,6,7. Thus, changes in bioimpedance values are expected depending on tissue characteristics.
Previous studies have explored using bioimpedance for lung tissue characterization8,9,10,11,12. Meroni et al.8. developed an impedance meter for living tissues to test if electrical impedance spectroscopy was helpful in providing information about the structure and the properties of tissues. To validate the instrument, they performed ex-vivo impedance measurements in 3 different rabbits from 6 different tissue types finding statistical significance for the discrimination among the multiple tissues. Toso et al.9. evaluated the distribution of the impedance vectors obtained at 50 kHz of frequency from 63 adult male patients with lung cancer and compared the results against 56 healthy subjects obtaining significant differences between cancer patients and control subjects due to a reduced reactance component. Baarends et al.10. predicted total body water (TBW) and extracellular water (ECW) in patients with chronic obstructive pulmonary disease (COPD) using bioelectrical impedance spectroscopy (BIS). They concluded that predicted TBW using BIS was comparable to actual TBW, but presented no improvement of the prediction of TBW using bioelectrical impedance analysis (BIA) at 50 kHz. They also found that prediction of ECW had still limitations. These three previous studies focused on the application of impedance measurements for other applications different than our study. Regarding the application of electrical impedance spectroscopy for lung neoplasm differentiation two studies are found. Baghbani et al.11. , constructed an electrical bioimpedance sensor with a biopsy forceps shape for measuring electrical conductivity of the tissue inside the body. They obtained and verified the relation between electrical conductivity of the tissue and measured electrical potential with COMSOL software. In addition, they designed and experimentally validated a prototype of the sensor. Furthermore, they measured the impedance of pulmonary tissues in three different samples of tissue founding that the sensor could be potentially beneficial to discriminate tumoral tissues from healthy ones in biopsy process. Baghbani et al.12. introduced a method to localize in-depth pulmonary nodules intraoperatively by building a bioimpedance probe with four spherical electrodes. They collected in-vitro bioimpedance data of 286 lung tissue samples and applied principal component analysis (PCA) followed by classification algorithms (support vector machine (SVM), linear discriminant analysis (LDA), and K-nearest neighbors (KNN)) to localize the pulmonary nodules by the bioimpedance spectrum of the lung tissue.
Apart from the last two studies introduced, to the extent of the author’s knowledge no studies have applied minimally invasive EIS for lung tissue differentiation, except those by their research group. The first one from Baghbani et al.11. , obtained the measures from a biopsy sample while the second applied the classification algorithms from in-vitro samples. Measurements performed by our research group consist on measuring the lung samples directly, during a bronchoscopy process to help in sampling location before performing a biopsy. The first study of our research group was performed by Sanchez et al.13. where a bioimpedance device was designed and validated for performing minimally-invasive bioimpedance measurements through bronchoscopy.
Later, research focused on validating the best electrode configuration, implementing a calibration method to reduce data variability, and statistically differentiating between lung tissue types14,15,16. To the extent of the authors knowledge there are not current reports describing the application of Machine Learning classification algorithms to classify neoplasm lung tissue by using electrical impedance spectroscopy measurements for the implementation of an electronic biopsy measurement method to complement the actual guidance systems for a bronchoscopy procedure.
The application of ML algorithms for clinical applications has raise importance in recent years. They present an opportunity to predict outcomes and develop new methods of diagnostic as well as improve prognostics17. Current studies regarding the implementation of Artificial Intelligence in pneumology, different studies18,19,20,21 performed in lung cancer applying classification algorithms, focused on the early diagnosis of the disease by using genetic data, computed tomography images and dosimetric features. The majority of the studies are focused on the disease prediction based on medical images.
With the above mentioned, the aim of this study is to compare different Machine Learning algorithms for neoplasm lung tissue classification using minimally-invasive electrical impedance spectroscopy measurements for the implementation of a low-cost guidance method during bronchoscopy aiding in precise biopsy region detection.
Materials and methods
Participants
Minimally invasive EIS measurements were performed between November 2021 and August 2022 in 102 patients (Age: 66 ± 14 year; Weight: 74.5 ± 17.2 kg; BMI: 26.8 ± 4.3 kgm-2) with a bronchoscopy prescribed at the “Hospital de la Santa Creu i Sant Pau” of Barcelona. A total number of 116 samples were obtained divided in 29 samples of lung tissue neoplasm and 87 samples of other lung tissue types (emphysema (N = 23), healthy lung tissue (N = 30), pneumonia (N = 22) and fibrosis (N = 12)).
EIS measurements
Minimally-invasive EIS using the 3-electrode method bioimpedance measurements are obtained through the injection of a multisine current signal (from 1 kHz to 1000 kHz) between a distal tetrapolar catheter electrode and a skin electrode during a bronchoscopy procedure. The voltage induced by the injected current is measured between the distal electrode and a second skin electrode. Impedance signal acquisition time was 12 s using a sample frequency of 60 spectra per second. The complete description of the impedance device as well as the calibration of the measurements is at Company-Se et al.15. Measurements with abnormally large impedance values due to lose of contact between the catheter electrodes and the tissue samples were discarded for the analysis. Radiological images (CT or PET/CT) are taken in each patient before bronchoscopy following the diagnostic process. The catheter used to obtain the bioimpedance data is inserted through the working channel of the bronchoscope. Patients are placed in a supine position during the bioimpedance acquisition with the upper airways anaesthetized. Moreover, intravenous sedation is also provided. Biopsy was obtained to confirm the neoplasm diagnosis. Prior to the bioimpedance and biopsy acquisition saline solution is injected to clean and homogenize the tissue conditions among the different patients.
Ethical statement
Ethics approval was obtained from the Hospital de la Santa Creu i Sant Pau (CEIC-73/2020) according to principles of the Declaration of Helsinki for experiments with human beings. The patients/participants provided their written informed consent to participate in this study.
Variables included in the study
The data features used to apply the classification algorithms is constituted by the 12 s mean averaged spectra of the bioimpedance |Z| and PA obtained from 15 frequencies ranged from 15 kHz to 307 kHz. Given the 29 samples of neoplasm and the 87 samples of other lung tissue types, together with the 15 measures of |Z| and the 15 measures of PA, the total dimensionality of the original dataset is 116 samples * 30 features.
Data preprocessing
Synthetic Minority Oversampling Technique (SMOTE) is applied twice22. First, SMOTE is applied to balance the dataset. From the 29 original neoplasm samples we created 58 synthetic neoplasm cases with a final first step sample size of 87 samples from lung neoplasm and 87 samples from other types of tissue. SMOTE is again applied to increase the sample size in a 50% more, thus obtaining from the 174 samples a final size of 262 samples (adding 44 synthetic samples per each class). Synthetic data creation has been performed using 5 Nearest Neighbors. After data augmentation, the total dimension of the dataset is 262 samples * 30 features.
Data analysis
The normality of all the features was assessed using the Kolmogorov-Smirnov test. Variables, non-parametric distributed, are described as median (interquartile range, IQR) and (minimum – maximum). Mann-Whitney U test, was used to assess non-normally distributed statistical significance between neoplasm tissue and the other group. The statistical significance was set as P < 0.05.
Classification models
Decision Tree, Support Vector Machines (SVM), Ensemble Method, K-Nearest Neighbors (KNN), Naïve Bayes and Discriminant Analysis classifiers were evaluated over the impedance modulus and phase angle of 15 frequencies distributed between 15 kHz and 307 kHz. Each dataset was normalized, as the range of values of the impedance modulus is different than the range of values of the impedance phase angle, in order to improve model performance and training efficiency. A grid search analysis was conducted to determine the optimal hyperparameter (HP) configuration for each model, employing a 5-fold cross-validation approach, which has been commonly utilized in prior medical research23,24. During every cross-validation, each dataset was partitioned into training (≈ 80% of the data) and test (≈ 20% of the data) and the model was trained and evaluated with each set of partitions. The HP optimized and the range of optimization are described in Table 2.
The software MATLAB version: 23.2.0.2485118 (R2023b), Natick, Massachusetts: The MathWorks Inc.; 2023 has been used to implement the algorithms using the Classification Learner App.
Classification models interpretability and performance assessment
To evaluate the performance of each implemented model, a receiver operating characteristic (ROC) graph was generated. ROC graphs serve as valuable tools for analyzing the effectiveness of classification models by examining their true positive rate in relation to their false positive rate25. The diagonal line in a ROC graph represents random guessing, and models positioned below this diagonal are considered less effective than random chance. An ideal classifier is depicted in the top-left corner of the graph, achieving a true positive rate of 1 and a false positive rate of 0. The area under the ROC curve (AUC) is calculated to quantitatively summarize the model’s classification performance. In addition to the ROC graph and AUC, the evaluation also includes precision, which is defined as the ratio of correctly identified positive instances (true positives) to the total predicted positive instances (true positives + false positives). Recall, representing the proportion of correctly identified positive instances (true positives) relative to the total actual positive instances (true positives + false negatives), is also considered. Furthermore, the F1-score, a metric that balances precision and recall to provide a comprehensive measure of classification performance, is computed as part of the assessment26. Finally, the implementation time for each of the algorithms is also obtained.
The overall process from data acquisition to model evaluation for classification to aid in a precise biopsy region detection is shown in the schematic diagram represented in Fig. 1.
Results
Impedance |Z| and PA mean impedance spectrum
Figures 2 and 3 show the mean impedance spectra of the modulus and phase angle respectively for neoplasm (black), and other tissue (healthy and pathologic, blue) along all the frequencies (15 kHz to 307 kHz) for the original data (left) and for the dataset augmented after SMOTE application (right). The continuous line represents de mean while the dashed and pointed lines represent the ± SD. It represents the evolution of the impedance values (|Z| and PA) with respect to the frequency for both tissue groups.
Analyzed variables
Table 3 show the descriptive information expressed as median (IQR) (minimum – maximum) of the variables included in the classification models for neoplasm group and for the group that included different types of lung tissue (Other tissue group). In addition, Table 3 shows the statistic U of Mann Whitney and the statistical significance P.
Classification models
Table 4 shows the optimal hyperparameter configuration for each of the classification models obtained from the 5-cross-validation grid search analysis, along with the corresponding test accuracy values and the standard deviation of the accuracy obtained from the cross-validation.
Figure 4 shows the confusion matrices, representing the relationship between actual and predicted classes, for each of the classification models implemented.
Classification models performance assessment
Figure 5 show the ROC curves, that represent a visual representation of the models’ performance, obtained for each of the classification models implemented. In addition, the precision, recall and F1-score metrics are also specified.
Table 5 shows the implementation time for each of the algorithms implemented.
Discussion
The application of Machine Learning classification algorithms are promising tools for the future of the medicine. They present an opportunity to help in the diagnosis of diseases for the tissue characterization.
Different pathologies can affect to the respiratory system such as emphysema, neoplasm, fibrosis or pneumonia. Each of these disorders have their own anatomical and histological changes, thus differences in bioimpedance values are expected27. Neoplasm is characterized by an increase in cell concentration as well as tissue vascularization28 which lowers the module impedance and increases the phase angle with respect to other types of tissue (Figs. 2 and 3).
Regarding data augmentation, large datasets generally enhance classification accuracy, whereas small datasets are prone to overfitting. Data augmentation techniques can mitigate these challenges by generating extra samples for training models, thereby extracting more meaningful insights from limited data. Researchers leverage these methods to increase dataset diversity29. Regarding the application of SMOTE, the new data is created from the average of 5 nearest original samples. According to the results of Figs. 2 and 3, it can be concluded that the algorithm created synthetic data accordingly as the mean impedance spectrum both, for impedance modulus and phase angle does not change from the original data to the dataset with synthetic data.
For the implementation of the classification algorithms, the impedance modulus and phase angle from 15 different frequencies from 15 kHz to 307 kHz have been used. Mann-Whitney U test can only test significance for each of the parameters alone. However, classification algorithms can use all the variable to find complex interconnections among variables to find a robust classification between neoplasm lung tissue or other type of lung tissue. As both, modulus and phase angle have obtained high statistical significance (P < 0.001) for all the frequencies, and the U statistic does not differ much among variables (Table 3), all the variables have been used for the classification models implementation.
With respect to the classification models, the learning problem consist on finding a complex equation able to classify the samples as better as possible using training data by minimizing the classification error and increasing the accuracy for classification in never seen data. To do that, the different characteristic parameters of each of the algorithms have been optimized with the grid search method. It remains highly effective and ensures the identification of the most optimal combination of model parameters to obtain the higher metrics as possible for the classification task. Thus, the different models implemented are adequately optimized (Table 4). In order to minimize as much as possible the overfitting of the algorithm, validation have been performed by using cross-validation with 5 folds30. Regarding the accuracy obtained for each of the classification models implemented, all of them obtained an accuracy higher than 0.95 except for the Naïve Bayes, which has obtained an accuracy of 0.92 in addition of the highest SD of the accuracy (0.05). Furthermore, the HP of Table 4 show that K-Nearest Neighbors used 1 neighbor to classify. This leads to the fact that although the bias of the 1-nearest neighbor estimate is often low, the variance is high31.
The different studies published regarding the use of classification algorithms for medical applications highlight the high performance of Support Vector Machines algorithm, although there has not been found consistence regarding which algorithm could fit best for medical applications32,33,34. Therefore, it is important to compare the performance of different algorithms to decide which suits best for the application under study. In our case, all of them, except Naïve Bayes and KNN algorithms could serve to detect neoplasm lung tissue through an electronic biopsy, with special focus on Support Vector Machines. Comparing the performance of the implemented algorithms with the study in the field that also applied classification algorithms12, they found that SVM performed slightly better than LDA and KNN. However, the differences between the performance of the three algorithms are minor. In our case, the differences in performance between SVM and LDA are also minor. However, as the literature suggests, we focused especially on SVM since it seems to be the algorithm that performs the best on medical data. The findings in the literature lead to the conclusion that the application of already created classification algorithms are useful and safe for new medical applications or to complement already existing medical approaches.
To fully evaluate the performance of the algorithms implemented, the classification report for all the algorithms have been obtained (Fig. 5). We have plotted the ROC curve, used to evaluate the performance of a binary classification method for diagnostic, together with the AUC (Area Under Curve), which is a measure used to evaluate the accuracy of the test25. All the classification methods reached an AUC higher than 95% which means the test accuracy is excellent. In order to fully assess the performance of the classification tests, the precision, recall and F1-score have also been calculated. While precision takes more importance when false positives are wanted to be avoided, the recall takes importance when false negative are not desired35. F1-score offers a trade-off between precision and recall. For our clinical problem, false positives are not desired, as the biopsies of other tissue rather than neoplasm are not of clinical interest. On the other hand, false negative is also not desired, as we want to accurately localize neoplasm tissue to help in sampling location during bronchoscopy. Therefore, we focus on the F1-score to evaluate the performance of the classification methods implemented.
Given that the classification method is intended for detection of tissue types during bronchoscopy, the duration of bioimpedance data acquisition and processing is critical, with shorter execution times being a significant advantage. The implementation time of the various classification methods employed are described in Table 4 with shorter times in Decision Tree, SVM and Discriminant Analysis algorithms.
According to Fig. 5, except for Naïve Bayes algorithm, all the algorithms obtained an F1-score higher than 0.95, which, together with the accuracy displayed in Table 4 and the execution times shown in Table 5, leads to the conclusion that Decision Tree, SVM and Discriminant Analysis are suitable for our clinical application. In future studies the performance of Decision Tree, Discriminant Analysis and, specially, SVM, will be tested in real-time. In addition, by using an electromagnetic navigation bronchoscopy the usefulness of the classification algorithms to detect peripheral nodules will be evaluated.
Clinical significance
This study demonstrated the usefulness of machine learning algorithms for detecting neoplasm lung tissue during a bronchoscopy by performing an electronic biopsy and measuring the bioimpedance of the tissue. Indeed, thanks to machine learning, healthcare professionals can take advantage of tissue electrical properties variations which may remain unmeasured by actual navigation systems.
In clinical practice, this minimally-invasive sampling localizer could be employed in the interventional pulmonology unit of hospitals for accurate biopsy sampling location during bronchoscopy enabling the decrease the negative biopsies due to sampling errors.
Limitations
Lung nodules, according to their location, can be classified as central nodules and peripheric nodules. During bronchoscopy, patients receive sedation, not anesthesia which implies that movement of the patients are frequent. In order to ensure and minimize the risk of not piercing the pleura, this study only included central nodules, which are analyzed with a conventional bronchoscopy.
Conclusions
The results obtained after the comparison among all the algorithms implemented for neoplasm lung tissue classification using minimally-invasive electrical impedance spectroscopy measurements show that Decision Tree, Discriminant Analysis and Support Vector Machines algorithms, with special emphasis in the last one, are suitable for the implementation of a low-cost guidance method during bronchoscopy and that a new tool could be designed as new guidance tool.
Data availability
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Andolfi, M. et al. The role of bronchoscopy in the diagnosis of early lung cancer: a review. J. Thorac. Dis. 8, 3329–3337 (2016).
Rivera, M. P., Mehta, A. C. & Wahidi, M. M. Establishing the diagnosis of lung cancer: diagnosis and management of lung cancer, 3rd ed: American college of chest physicians Evidence-Based clinical practice guidelines. Chest 143, e142S–e165S (2013).
Herth, F. J. F., Eberhardt, R., Becker, H. D. & Ernst, A. Endobronchial ultrasound-guided transbronchial lung biopsy in fluoroscopically invisible solitary pulmonary nodules: a prospective trial. Chest 129, 147–150 (2006).
Folch, E. E. et al. Electromagnetic navigation bronchoscopy for peripheral pulmonary lesions: One-Year results of the prospective, multicenter NAVIGATE study. J. Thorac. Oncol. 14, 445–458 (2019).
Lukaski, H. C. Biological indexes considered in the derivation of the bioelectrical impedance analysis. Am. J. Clin. Nutr. 64, 397S–404S (1996).
Lukaski, H. C., Diaz, V., Talluri, N., Nescolarde, L. & A. & Classification of hydration in clinical conditions: indirect and direct approaches using bioimpedance. Nutrients 11, 809 (2019).
Khalil, S., Mohktar, M. & Ibrahim, F. The theory and fundamentals of bioimpedance analysis in clinical status monitoring and diagnosis of diseases. Sensors 14, 10895–10928 (2014).
Meroni, D., Bovio, D., Frisoli, P. A. & Aliverti, A. Measurement of electrical impedance in different ex-vivo tissues. in 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 2311–2314 (IEEE, Orlando, FL, USA, 2016). (2016). https://doi.org/10.1109/EMBC.2016.7591192
Toso, S. et al. Altered tissue electric properties in lung cancer patients as detected by bioelectric impedance vector analysis. 5, (2000).
Baarends, E. M., Van Marken Lichtenbelt, W. D., Wouters, E. F. M. & Schols, A. M. W. J. Body-water compartments measured by bio-electrical impedance spectroscopy in patients with chronic obstructive pulmonary disease. Clin. Nutr. 17, 15–22 (1998).
Baghbani, R., Moradi, M. H. & Shadmehr, M. B. Momayez Sanat, Z. A new Bio-Impedance forceps sensor for measuring electrical conductivity of the biological tissues. IEEE Sens. J. 19, 11721–11731 (2019).
Baghbani, R., Shadmehr, M. B., Ashoorirad, M., Molaeezadeh, S. F. & Moradi, M. H. Bioimpedance spectroscopy measurement and classification of lung tissue to identify pulmonary nodules. IEEE Trans. Instrum. Meas. 70, 1–7 (2021).
Sanchez, B. et al. In vivo electrical bioimpedance characterization of human lung tissue during the bronchoscopy procedure. A feasibility study. Med. Eng. Phys. 35, 949–957 (2013).
Company-Se, G. et al. Minimally invasive lung tissue differentiation using electrical impedance spectroscopy: A comparison of the 3- and 4-Electrode methods. IEEE Access. 10, 7354–7367 (2022).
Company-Se, G. et al. Effect of calibration for tissue differentiation between healthy and neoplasm lung using minimally invasive electrical impedance spectroscopy. IEEE Access. 10, 103150–103163 (2022).
Company-Se, G. et al. Differentiation using Minimally-Invasive bioimpedance measurements of healthy and pathological lung tissue through bronchoscopy. Front. Med. https://doi.org/10.3389/fmed.2023.1108237 (2023).
May, M. Eight ways machine learning is assisting medicine. Nat. Med. 27, 2–3 (2021).
Hosni, M., Carrillo-de-Gea, J. M., Idri, A., Fernandez-Aleman, J. L. & Garcia-Berna, J. A. Using ensemble classification methods in lung cancer disease. in. 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC) 1367–1370 (IEEE, Berlin, Germany, 2019). (2019). https://doi.org/10.1109/EMBC.2019.8857435
Bharati, S., Podder, P. & Mondal, M. R. H. Hybrid deep learning for detecting lung diseases from X-ray images. Inf. Med. Unlocked. 20, 100391 (2020).
Tekerek, A. & Al-Rawe, I. A. M. A novel approach for prediction of lung disease using chest X-ray images based on densenet and MobileNet. Wirel. Pers. Commun. https://doi.org/10.1007/s11277-023-10489-y (2023).
Jasmine, P. et al. Lung Diseases Detection Using Various Deep Learning Algorithms. J. Healthc. Eng. 1–13 (2023). (2023).
Sreejith, S., Nehemiah, H. K. & Kannan, A. Clinical data classification using an enhanced SMOTE and chaotic evolutionary feature selection. Comput. Biol. Med. 126, 103991 (2020).
Al-Zaiti, S. S. et al. A clinician’s guide to Understanding and critically appraising machine learning studies: a checklist for ruling out bias using standard tools in machine learning (ROBUST-ML). Eur. Heart J. - Digit. Health. 3, 125–140 (2022).
Palmieri, F. et al. Machine learning allows robust classification of visceral fat in women with obesity using common laboratory metrics. Sci. Rep. 14, 17263 (2024).
Bradley, A. P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30, 1145–1159 (1997).
Richardson, E. et al. The receiver operating characteristic curve accurately assesses imbalanced datasets. Patterns 5, 100994 (2024).
Weinberger, E., Cockrill, S. A., Mandel, J. & B. & Principles of Pulmonary Medicine (Elsevier, 2019).
Hammer, D. & McPhee, J. G. S. Pathophysiology of Disease, an Introduction To Clinical Medicine. (McGraw-Hill Education) (2019).
Hassan, H. et al. Review and classification of AI-enabled COVID-19 CT imaging models based on computer vision tasks. Comput. Biol. Med. 141, 105123 (2022).
Müller, C., Guido, S. & A. & Introduction To Machine Learning with Python, A Guide for Data Scientists (O’Reilly Media,, 2017). United States of America.
Hastie, T., Tibshirani, R. & Friedman, J. The Elements of Statistical Learning (Springer New York Inc., 2001).
Chang, R. F., Wu, W. J., Moon, W. K., Chou, Y. H. & Chen, D. R. Support vector machines for diagnosis of breast tumors on US images. Acad. Radiol. 10, 189–197 (2003).
Dreiseitl, S. et al. A comparison of machine learning methods for the diagnosis of pigmented skin lesions. J. Biomed. Inf. 34, 28–36 (2001).
Gholamzadeh, M., Abtahi, H. & Safdari, R. Comparison of different machine learning algorithms to classify patients suspected of having sepsis infection in the intensive care unit. Inf. Med. Unlocked. 38, 101236 (2023).
Hicks, S. A. et al. On evaluation metrics for medical applications of artificial intelligence. Sci. Rep. 12, 5979 (2022).
Acknowledgements
To the patients without whom this study would not have been possible. In addition, Marta Navarro Colom, Laura Romero Roca, and Margarita Castro Jiménez from the Interventional Pulmonology Unit, Respiratory Medicine Department, Hospital de la Santa Creu i Sant Pau for the invaluable support.
Funding
This work was supported by the Spanish Ministry of Science and Innovation (PID2021-128602OB-C21) and supported by the Secretariat of Universities and Research of the Generalitat de Catalunya and the European Social Fund.
Author information
Authors and Affiliations
Contributions
GC: designed the experiments, performed the experiments, performed the data processing, ana-lyzed the data, drafted the manuscript, prepared the tables and figures, revised and approved the final version of the manuscript. VP: designed the experiments, performed the experiments, revised and approved the final version of the manuscript. AR: designed the experiments, performed the experiments, revised and approved the final version of the manuscript. PR: designed the experiments, revised and approved the final version of the manuscript. JR: designed the experiments, revised and approved the final version of the manuscript. RB: designed the experiments, revised and approved the final version of the manuscript. LN: designed the experiments, performed the data processing, analyzed the data, drafted the manuscript, prepared the tables and figures, revised and approved the final version of the manu-script.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Company-Se, G., Pajares, V., Rafecas-Codern, A. et al. Machine learning allows robust classification of lung neoplasm tissue using an electronic biopsy through minimally-invasive electrical impedance spectroscopy. Sci Rep 15, 9716 (2025). https://doi.org/10.1038/s41598-025-94826-0
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-94826-0







