Introduction

Multiple myeloma (MM) is the second most prevalent hematological malignancy, characterized by the abnormal proliferation of malignant plasma cells (PCs) within the bone marrow. These PCs produce excessive amounts of monoclonal immunoglobulin. MM is identified by a set of symptoms known as CRAB: hypercalcemia, renal insufficiency, anemia, and bone lesions. Current diagnostic criteria moved treatment to earlier stages; however, the golden standard for diagnosis is still bone marrow biopsy1,2.

Extramedullary disease in multiple myeloma (EMD) occurs when a subclone of malignant PCs migrates from the bone marrow, potentially forming tumors. EMD is categorized into primary (present at the time of MM diagnosis) and secondary (found at MM relapse). Additionally, EMD can be classified based on location: paraskeletal (where PCs maintain partial dependence on the bone marrow) and extraskeletal EMD (where PCs are entirely independent of the bone marrow)3,4,5. The diagnosis is typically confirmed through imaging techniques; 18F-FDG PET/CT was recommended by the International Myeloma Working Group (IMWG)6. EMD is generally an adverse prognostic factor5,7,8,9,10.

Liquid biopsies encompass a range of tests utilizing body fluids to identify biomarkers, primarily focusing on blood and urine. The potential advantages of liquid biopsies include minimal invasiveness, simplicity, repeatability, the ability to conduct serial analyses, and lower cost. Furthermore, liquid biopsies offer the benefit of capturing the heterogeneity of diseases such as MM, which may not be fully represented in single-site tissue biopsies. Assessed analytes include circulating tumor cells11, cell-free DNA12,13, various non-coding RNA molecules14,15,16, proteins and peptides17,18, and extracellular vesicles and their contents19. Matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) proved a potent tool for differentiating and identifying biomarkers in bodily fluids20,21,22,23,24,25,26. Although MALDI-TOF MS coupled with machine learning algorithms is not likely to replace imaging techniques, it could prove useful in early prediction of EMD.

The primary objective of our study was to develop a sensitive and minimally invasive screening method to identify primary EMD patients. Thus, we used plasma of peripheral blood collected from 172 patients assessed by a supervised partial least squares-discriminant analysis (PLS-DA) prediction model to discriminate between primary EMD and MM patients.

Results

Mass spectra comparison

MALDI-TOF MS quantitative measurement conditions were carefully optimized by selecting the appropriate matrix, laser energy, and extensive signal averaging to achieve the highest possible robustness, as previously published27.

The visual comparison of mass spectra profiles in Fig. 1 illustrates the differences between MM and primary EMD samples. Mass spectra of MM and primary EMD exhibit differences mainly in intensity within the 2.0–6.0 kDa and 8.0–10.2 kDa range.

Figure 1
figure 1

Representative mass spectra (A) multiple myeloma and (B) primary extramedullary sample. The signals in the range of 2.0–6.0 kDa (red) and 8.0–10.2 kDa (blue) have a higher predictive ability than those with the highest intensity.

Multivariate statistic

An unsupervised principal component analysis (PCA) model was initially built using the original 67 variables. However, it did not successfully differentiate between MM and primary EMD (data not shown). To attain clear group differentiation, supervised orthogonal projections to latent structures discriminant analysis (OPLS-DA) models were used with the same dataset.

The optimal OPLS-DA model was achieved using five components. Determining the ideal number of components involved evaluating model fit and prediction capability. Insufficient differentiation of classes (i.e., health condition) may occur with too few components, while an excess of components may lead to overfitting and diminished prediction power. The R2X and R2Y were used to describe the percentage of dependent and independent variables (X and Y) inertia explained by the model. The optimal number of orthogonal components was evaluated based on the Q2Y parameter, which estimates the predictive performance of the model through the fivefold cross-validation. The model maximum Q2Y value indicates the number of components at which the model exhibits the best performance. Model diagnostics demonstrate good predictive ability, as shown by R2X: 0.644, R2Y: 0.419, and Q2Y: 0.197. The percentage of Y (independent variable) inertia explained by the model (R2Y) increases with the number of components. Permutation tests with 20 replicates confirmed the model robustness. The OPLS-DA scores plot (Fig. 2), which employed enhanced grouping based on predefined groups, demonstrated a greater degree of separation compared to unsupervised PCA.

Figure 2
figure 2

Orthogonal projections to latent structures discriminant analysis score plot of evaluated samples. Blue marks—primary extramedullary disease (EMD), red marks—multiple myeloma (MM).

The performance of OPLS-DA was assessed for both gender-separated data (73 samples from women, 99 samples from men) and the original dataset, revealing negligible changes. This suggests that the approach used in this work is independent of gender (data not shown).

Predictive model building and assessment

The feature matrix data were utilized to develop a predictive model able to categorize plasma samples into MM and primary EMD-predicted classes. The Caret R package implements algorithms for building predictive models28. The performance of the ML algorithm was evaluated using a training dataset consisting of 70% of the original dataset (70 primary EMD, 51 MM) alongside a test dataset (22 primary EMD and 29 MM); the data were divided randomly to minimize bias.

The study employed a leave-one-out methodology for cross-validation with ten repetitions to train five machine learning algorithms: PLS-DA, k-NN (k-nearest neighbors), DT (decision tree), RF (random forest), and ANN (artificial neural network). These algorithms are well-suited for analyzing small datasets and do not impose any specific requirements. Evaluation of the models was based on their overall accuracy in predicting the outcome (Fig. 3).

Figure 3
figure 3

The accuracy of predictive models based on 10 × repeated fivefold cross-validation.

The PLS-DA algorithm, which we have previously identified as the most effective model for distinguishing between MM and PCL samples27, exhibited the best performance during the cross-validation. The PLS-DA predictive model with eight components demonstrated the best performance, achieving an overall accuracy of 70.8%.

Subsequently, variable importance for the projection (VIP) was calculated. The VIP plot is presented in Fig. 4A. The two most significant variables for the PLS-DA predictive model are illustrated in box plots (Fig. 4B,C).

Figure 4
figure 4

(A) Variability importance (VIP) for PLS-DA predictive model. Box plots for the two most important variables for the model (significant difference, p < 0.05 for group EMD and MM), (B) 6692 Da and (C) 6837 Da (B).

The PLS-DA predictive model performance was tested on the training dataset (accuracy 87.6%), as shown in Fig. 5A. Out of the 121 tested samples, 106 were correctly assigned to the MM and primary EMD categories, resulting in 90.0% sensitivity, and 84.3% specificity. Figure 5B shows the performance of the PLS-DA predictive model on test data (accuracy 78.4%). A total of 40 out of 51 samples were successfully classified, including 19 out of 22 primary EMD and 21 out of 29 MM samples, resulting in 86.4% sensitivity and 72.4% specificity (Table 1).

Figure 5
figure 5

The predictive capability of the PLS-DA model. (A) In the training dataset, 63/70 primary EMD and 43/51 MM samples were correctly identified. (B) In the test dataset, 19/22 primary EMD and 21/29 MM samples were correctly identified.

Table 1 The performance of PLS-DA predictive model was assessed using accuracy, sensitivity and specificity, accompanied by 95% Wilson confidence intervals (CI).

Discussion

Mass spectrometry (MS) is a diagnostic technique that enables highly specific and sensitive measurement of disease biomarkers; possible applications include liquid biopsies24,29. This approach in MM and related diagnoses presents the opportunity for a more comprehensive assessment of the disease in all its heterogeneity, particularly when contrasted with single-site bone marrow biopsy29. Recently, several studies have implemented MALDI-TOF MS to identify or monitor biomarkers in MM20,23,30,31,32,33. MALDI-TOF MS has also been employed for the proteomic analysis of various cancers, including MM18,34,35,36,37.

In the context of MM, the use of MS is predominantly being explored for isotyping, quantification, or continuous monitoring of M-protein, monoclonal immunoglobulin with a specific molecular weight stemming from a characteristic amino acid sequence, as an individualized biomarker for a patient with the purpose of diagnosis or monitoring the tumor burden38,39,40,41,42,43. The benefits of MS are repeatedly demonstrated higher sensitivity and specificity compared to the conventionally used immunofixation or protein gel electrophoresis, as highlighted by the recent IMWG report33,43,44.

In contrast, few studies focused on broader peptide and/or protein profiling in peripheral blood samples. Wang et al. utilized magnetic beads coupled with MALDI-TOF MS detection and decision tree algorithm to develop a diagnostic tool with a sensitivity of 86.67% for MM patients and 100% for healthy donors (HD); authors also included samples from patients with other plasma cell dyscrasias and malignant diseases with skeletal metastases, but their findings were not as conclusive18. The biomarker model designed in 2012 distinguished serum samples from MM patients and HD by pinpointing four proteins corresponding to peaks with the highest separation power within the 0.7–10 kDa range; the model attained 86.36% sensitivity and 87.5% specificity45. Barceló et al. achieved nearly 90% sensitivity, specificity, and accuracy when combining the 2–10 kDa MALDI-TOF MS fingerprinting with machine learning to differentiate between samples from MGUS patients and HD20.

Our study from 2019 shows the ability of the ANN-based (artificial neural network) model to discriminate between samples from MM patients and HD with high sensitivity (100%), specificity (95%), and accuracy (98%)17. Similarly, MALDI-TOF MS coupled with ANN successfully distinguished between HD and MM samples with 93.55% sensitivity and 92.19% specificity46. Nevertheless, to the best of our knowledge, researchers have not yet been able to leverage these methods to discern between MM and primary EMD satisfactorily; studies focus primarily on the comparison of MM patients to HD. Therefore, we aimed to harness the well-established advantages of MALDI-TOF MS and liquid biopsies with a modified approach27.

Our goal was to achieve the highest possible prediction accuracy for the samples from primary EMD patients without evaluating the biological significance of plasma spectral features. To this end, we constructed multiple models utilizing machine learning algorithms such as random forest, decision tree, etc. Ultimately, the PLS-DA predictive model with eight components emerged as the best-performing candidate. The training set was comprised of 121 samples (accuracy of 87.6%): 70 primary EMD (sensitivity of 90.0%) and 51 MM (specificity of 84.3%). The testing set was composed of 51 samples in total (22 primary EMD and 29 MM) and achieved an accuracy of 78.4%, sensitivity of 86.4%, and specificity of 72.4%. This may be attributable to several factors: MM is a heterogeneous disease leading to heterogeneous EMD. Moreover, biologically, newly diagnosed MM and primary EMD are not separate entities, or rather, their similarities are higher than previously suspected9.

The key outcome of our analysis is the 86.4% sensitivity for primary EMD classification. These results, complemented by the advantages of liquid biopsies, suggest the possibility of using the method to rapidly identify patients with a higher probability of primary EMD diagnosis. While the specificity (MM identification), standing at 72.4%, falls short of expectations and results in an elevated false positive rate, there is currently no comparable method capable of detecting primary EMD from peripheral blood with this level of accuracy.

Consequently, it could function as a fast tool allowing to precheck the patients for more expensive imaging methods or in case when imaging is not readily available. However, our results need to be confirmed by additional studies, and we intend to expand and test the model further. Future directions for MALDI-TOF MS coupled with machine learning may include early prediction of secondary EMD or efficacy assessment of EMD. The difference in proteome profile in EMD compared to MM likely reflects plasma cells increased autonomy and their ability to migrate out of the bone marrow, supported by possible alterations in cell adhesion pathways and acquisition of an EMT-like (epithelial-mesenchymal transition) phenotype. Our analysis of peripheral blood shows that these changes are detectable even beyond the bone marrow microenvironment and may be clinically relevant as an initial indicator of EMD.

Conclusion

The findings of this study highlight the potential of liquid biopsy utilizing MALDI-TOF MS protein and peptide fingerprinting alongside the machine learning predictive model partial least squares-discriminant analysis. This approach correctly identified 19 out of 22 primary EMD patients in the test dataset, achieving a sensitivity of 86.4%. In addition, the method offers advantages such as speed, cost-effectiveness, minimal invasiveness, and repeatability, rendering it a suitable complementary tool for identifying primary EMD patients promptly.

Methods

Sample collection

A total of 172 plasma of peripheral blood samples were used in the study: 80 MM samples and 92 primary EMD samples. MM and primary EMD samples were obtained at diagnosis and prior to treatment initiation. The diagnosis of primary EMD was established by imaging (PET-CT, MRI, low-dose CT) and confirmed by histopathology of biopsied samples. Patients included in the analysis were diagnosed at the University Hospital Brno (112 patients) and University Hospital Ostrava (60 patients), Czech Republic (Table 1). All patients signed an informed consent approved by the Ethics Committees of the hospitals following the current version of the Declaration of Helsinki. This research was approved by the Ethics Committee of the University Hospital Brno (10/6/2020, 12-100620/EK).

Sample preparation and matrix-assisted laser desorption/ionization mass time-of-flight mass spectrometry (MALDI-TOF MS)

Extraction of proteins and peptides was performed as described previously. Sinapinic acid (SA) and trifluoroacetic acid (TFA) were purchased from Sigma-Aldrich (Steinheim, Germany), acetonitrile (ACN) from Penta (Prague, Czech Republic), the protein calibration mix ProMix1 from LaserBio Labs (Valbonne, France). The procedure involved addition of 50 μL of ACN to 25 μL of the sample, the collected supernatant was discarded, whereas the precipitate was treated with 50 μL of 50% ACN with 0.1% TFA. The resulting extract (supernatant containing peptides and proteins) was subsequently subjected to MALDI-TOF MS analysis. The details of this extraction method are described in the previous study27.

The collected extract was mixed with the SA matrix in a 1:1 ratio (20 mg/mL dissolved in 50% ACN supplemented with 2.5% TFA). Immediately after, 2 μL of the homogenized sample was transferred to the metal target in five technical replicates as described previously17,27,47.

Sample measurement was conducted on a MALDI-7090™ MALDI TOF-TOF mass spectrometer (Shimadzu, Japan) equipped with a 2 kHz ultra-fast solid-state UV laser (Nd-YAG: 355 nm), variable beam focus ranging from 10 μm to over 100 μm. The mass spectra were obtained using the MALDI Solutions Data Acquisition v 2.1.1.0 software and recorded in the linear positive ion mode for the mass range of 2–20 kDa with pulsed extraction set at 12.5 kDa. Spectra were produced by averaging 5,000 individual laser pulses (10 shots at 500 varying points) at a laser frequency of 1 kHz, with a laser diameter of 100 μm. Calibration was carried out externally utilizing the protein calibration mix 1 (ProMix1) 2.8–17 kDa ions with a mass error of ± 200 mDa.

Mass spectra processing

The raw mass spectra in the mzML format were pre-processed using R (4.0.4) programming language, namely MALDIquant and MALDIrppa packages as described previously elsewhere27,48,49. Low-quality spectra were determined using a screening function based on a robust scale estimator of median intensities and derivative spectra (implemented in the MALDIrppa package) and excluded. The processing of individual mass spectra included smoothing (Savitzky-Golay filter with halfwindowSize function = 100), baseline correction (SNIP algorithm with 500 iterations), and spectra alignment using a set of spectral peaks as a reference (MAD noise estimation algorithm with signal-to-noise (S/N) = 10 and a half-window size = 20)50,51,52. Representative mass spectra were calculated as the median of 5 technical replicates for each sample. Peaks with S/N = 10 were extracted as spectral features only if found in at least 10% of the total to avoid artifacts (caused by sample collection, during processing, as a result of patient’s medication, or health condition). A corresponding intensity matrix was used for further statistical analysis.

Predictive machine learning algorithms

The raw mass spectra obtained from the 172 plasma samples (80 MM and 92 primary EMD) were pre-processed as described above. Following alignment and binning, a set of 67 m/z variables was obtained based on the selection criteria. The matrix includes sample identification and biological group flags (MM and primary EMD) for further multivariate statistical analysis and predictive model building. Multivariate statistical models were performed using all available data, including unsupervised (PCA) and supervised (k-NN, DT, RF, ANN, and PLS-DA). The caret R package was used to train and optimize the predictive model28. Five predictive models were constructed to classify plasma samples into MM or primary EMD-predicted classes.

Repeated fivefold cross-validation was performed 10 times to assess model performance and tune predictive models53 using a confusion matrix where the biological group and predicted class were cross-tabulated. Accuracy, sensitivity, and specificity were calculated for all models.