Background & Summary

Covid-19

Covid-19 is an infection caused by the SARS-CoV-2 virus. Individuals infected with SARS-CoV-2 exhibit a range of symptoms that might include fever, cough, loss of taste or smell and shortness of breath1. As well as a range of other “common cold like” symptoms such as fatigue, congestion or runny nose, nausea or vomiting, and diarrhoea2.

On the molecular level, a Covid-19 infection can trigger many alterations in the haemostatic system3. Immune response, for instance, can be significantly altered; studies reported a massive upregulation of cytokines and chemokines leading to cytokine storm4 resulting in complications that may lead to acute respiratory system complications and even multiorgan failure5. Moreover, blood clotting and coagulation pathways activated by the virus lead to deep vein thrombosis (DVT), pulmonary embolism (PE) as suggested by some recent studies6,7. On the metabolic level, Covid-19 infections seem to instigate alterations in glucose, lipid, and amino acids metabolism8,9,10.

Metabolomics and lipidomics

Omics analysis enables a deeper understanding of biological mechanisms. Although each layer of omics contains enough complexity to expound a certain mechanism, combining two or more omics datasets allows the unlocking of new insights into the cellular functionality, which in turn helps in understanding the underlying biological aspects of complex pathologies11,12. Metabolomics, for instance, is predominantly conceded to provide the match from genotype to the phenotype (given the availability of genomics data): metabolites screening allows the examination of metabolic changes, revealing the alterations in pathways13,14. Lipidomics, especially the targeted approach, stands as one of the most well-established tools to evaluate, diagnose and better understand human pathologies15,16.

Many recent studies17,18,19 have highlighted the power of Multiomics in describing the metabolic alteration caused by a Covid-19 infection. In these studies, extensive genomics, metabolomics and lipidomics analysis revealed distinct alteration associated with the infection.

Few studies, however, considered the phenotype effect on the analysis and interpretation. By correlating phenotypic clinical readings with untargeted analysis, a new way emerges to better understand the mechanisms of a disease13,20.

One major challenge in the field of untargeted omics is the lack of substantiated, cross validated libraries21,22,23. A decent portion of the analysed chromatographic features in each study remains thus undeciphered and is “usually” annotated as unknowns24. More often than not, these unknowns are regulated (up or down) and correlate with their regulation with numerous identified targets and biomarkers25.

The realm of omics is developing, the percent of the undeciphered portion is consistently getting smaller, and there are lots of tools emerging annotating these unknowns. Until better tools and more conclusive libraries are available, it is very important to conserve the data that is already acquired26. Saving and sharing data in public repositories and repurpose them later is an advancing trend with many advantages including resources conservation, reduction of redundant clinical and animal trials26,27,28.

Motives and aims

In this work we describe the metabolome and lipidome profiles of 39 Covid-19 patients in contrast to 39 healthy individuals. We strived to produce a comprehensive dataset by employing, to our knowledge, state-of-the-art metabolomics and lipidomics methodologies. Furthermore, this dataset was supported by comprehensive phenotype data including clinical and co-morbidity information for each study sample. The study samples were analysed using an extensive workflow (see Fig. 1) to ensure the acquisition of all relevant “potential” metabolites. Specifically, we aimed to implemented both reversed phase ultra-high performance liquid chromatography (RP-UHPLC) chromatography for the separation of lipids and hydrophilic interaction liquid chromatography (HILIC) chromatography for the separation of the polar metabolites. In both chromatography modes we acquired using IDA (Independent Data Acquisition) in positive and negative ionization modes with top 3 ions mode using 2 fragmentation energies to ensure a comprehensive coverage using a quadrupole time-of-flight (QTOF) mass spectrometer. Additionally, we implemented SWATH-DIA (Sequential Window Acquisition of All Theoretical Mass Spectrometry – Data Independent Acquisition) because we aimed for a comprehensive non-biased dataset that allows for the acquisition of fragment ion spectra for all detectable metabolites29 regardless of their parent m/z, also, we wanted to have the option of being able to re-visit this dataset with future “yet to be developed” tools in a retrospective manner30.

Fig. 1
figure 1

Workflow of the study.

Methods

Study design

In the period between December 2020 to February 2021, the Analyses Blood Covid DNA (ABCD) study was carried out as a prospective case control study performed at the Emergency Department at the Clinic Donaustadt (Vienna, Austria), as well as at the Department of Nutritional Sciences, University of Vienna. In this study, Covid-19 infected participants (n = 48) were enrolled at the Emergency Department of the Clinic Donaustadt. Controls were recruited based on matching for age- and gender (n = 48) in the University of Vienna and the recruiting hospital (mainly staff members). Older adults were from current projects (‘NutriAging’: ‘protein study: https://clinicaltrials.gov/ct2/show/NCT04023513 and Vitamin D study: https://clinicaltrials.gov/ct2/show/NCT04341818). A summary of demographics and some key clinical parameters of patients and healthy subjects groups are shown in Table 1. The study design is illustrated in Fig. 2, The inclusion criteria for Covid-19 patients encompassed factors such as sex male/female), age ≥ 40 years), and the ability to provide written informed consent. Additionally, Covid-19 suspected patients had to be admitted to the emergency department or hospital during the acute infection. For Covid-19 patients, the criteria excluded those not hospitalized and individuals without definitive Covid-19 confirmation. The inclusion criteria for healthy controls were as follows: age ≥ 40 years), the ability to provide written informed consent and absence of severe illnesses. Exclusion criteria for controls involved clinically significant diseases, while a shared exclusion criterion for both groups was pregnancy. All relevant clinical parameters (Ct values qPCR, leukocytes, thrombocytes, erythrocytes, hemoglobin, hematocrit, MCV, MCH, MCHC, lymphocyte abs, monocyte abs, eosinophile granulocyte abs, basophile granulocytes abs, lymphocytes rel, monocytes rel, eosinophile granulocytes rel, basophile granulocytes rel, CRP, creatinine, uric acid, ASAT got, ALAT gpt, glucose, cholesterol, HDL cholesterol, triglyceride, IL -6, alkaline phosphatase, LDH, iron blood levels, transferrin, transferrin saturation, ferritin, albumin, bilirubin, vitamin D) for the participants were checked at the day of admission, and are shown in the descriptive file in our repository (Clinical Metadata.xls)31. The study was approved by the Ethical Commission of the City of Vienna (No. EK_20_284_1120) and was conducted in accordance with the approved guidelines by the Declaration of Helsinki. Written informed consent to take part in the study was received prior to participation, as well as for the sharing of all relevant data. The study was registered at ClincalTrilals.gov (Identifier: NCT04784468).

Table 1 Demographic characteristics of study participants: Healthy vs. COVID-19 hospital patients.
Fig. 2
figure 2

Flowchart depicting the workflow of the ABCD-Covid-19 study, employing LC-MS analysis. The flowchart outlines the sequential steps involved in data acquisition, preprocessing, and analysis and serves as a guide to illustrate the methodology employed in the study, facilitating reproducibility and transparency in the research process.

Plasma collection

Venous Blood samples were collected by venipuncture into EDTA-coated vacutainers. Samples were kept at room temperature for 30 minutes prior to separation of plasma (centrifuged at 3500 RCF for 15 minutes at 4 °C) for analysis, no haemolysis was observed, and then separated into aliquots and stored in 1.5 mL Eppendorf tubes at − 80 °C until analysed.

Extraction

Samples (50 μL plasma aliquots) were thawed only immediately before analysis and each sample was spiked with 10 μL of LC-MS internal standards (SPLASHTM Lipidomix® Mass Spec Standard, Avanti Polar Lipids, Inc.). Afterwards, samples were extracted using a modified Methyl tert-butyl ether (MTBE) (VWR Chemicals, Radnor, Pennsylvania USA) extraction protocol as suggested by Mataysh et al.32 Briefly, in an 1.5 mL Eppendorf, a 50 μL plasma aliquot was homogenized with 300 µL ice cold Methanol (VWR Chemicals, Radnor, Pennsylvania USA) using an ultrasonic bath for 10 minutes. Afterwards 1 mL of MTBE was added and the mixture was vortexed vigorously. The Eppendorf tubes were incubated on a cooled shaker for 60 minutes. Afterwards, the mixture was transferred into a new Eppendorf tube and 250 µL Milli-Q H20 (Merck, Darmstadt, Germany) were added. After vortexing and centrifugation (2000 g for 1 minute) 2 phases were formed: top MTBE phase containing the lipophilic compounds for the lipidomics analysis, and a bottom (water: methanol) phase for the metabolomics analysis. Both phases were dried using a speed vac (SpeedVac SPD1030 Thermofisher Scientific, Bremen, Germany) at room temperature and 5.1 torr and stored until analysis time.

QC Samples and blanks and order of acquisition

QC samples were individually prepared for lipidomics and metabolomics. For both QCs, 5 μL from each sample (healthy and covid-19) were pooled together in one vial. Blank were the same solution used to dissolve the dried samples: 100% Methanol for lipidomics and 80% Methanol for metabolomics. Each workflow was acquired in 5 analytical batches (Named B1-5 in the repository), the order of the blanks (12 Blanks per batch) and QC (5 QC samples per batch) can be seen in the sequence files tabs in the repository{Ammar Tahir, 2023 #40}.

Lipidomics using RP-UHPLC-QTOF

The dried lipidomics phases were dissolved (assisted with vortexing and sonification) in 150 µL methanol prior to analysis. Analysis was performed using an adapted 15-minute gradient as suggested by Fiehn et al.33 using AQUITY UPLC BEH C18 Column (Waters, Milford Massachusetts, USA), 130 Å, 1.7 µm, 2.1 mm × 100 mm and the following mobile phases: (A) 60:40 (v/v) acetonitrile: water with 10 mM ammonium formate in positive mode or 10 mM ammonium acetate in negative mode (B) 90:10 (v/v) isopropanol: acetonitrile with 10 mM ammonium formate in positive mode or 10 mM ammonium acetate in negative mode. All chemicals were purchased from VWR Chemicals (Radnor, Pennsylvania USA). Gradient details are listed in Table 2 below.

Table 2 Gradient details used in the RP-UHPLC-QTOF approach.

Untargeted lipid profiling was performed using Sciex X500R QTOF (AB Sciex, Darmstadt, Germany). Data were acquired using the IDA method (all metabolites option) using the original Sciex OS ver 2.0.1 acquisition software with the parameters shown in Table 3 below.

Table 3 Experimental parameters used in RP-UHPLC-QTOF approach.

Data acquired using SWATH-DIA method using the original Sciex OS ver 2.0.1 acquisition software with the parameters as shown in Table 4 and with the isolation windows shown in Table 5 below.

Table 4 Experimental parameters used in SWATH-DIA approach.
Table 5 Time window settings used in SWATH-DIA approach.

Metabolomics using HILIC-UHPLC-QTOF

The dried metabolomics phases were dissolved in 150 µl 80% Methanol prior to analysis. Analysis was performed using an adapted 10-minute gradient as suggested by Fiehn et al.33 using HILIC Phenomenex, 130 Å, 1.7 µm, 2.1 mm × 100 mm and the following mobile phases: (A) 95:5 (v/v) acetonitrile: water with 10 mM ammonium formate (B) 50:50 (v/v) acetonitrile: water with 10 mM ammonium formate. Untargeted metabolites profiling was performed using Sciex X500R QTOF. Data were acquired using the IDA method (all metabolites option) with same parameters mentioned above using the original Sciex OS ver 2.0.1 acquisition software. All chemicals were purchased from VWR Chemicals (Radnor, Pennsylvania USA). Gradient minutes are listed in Table 6.

Table 6 Gradient details used in the HILIC-UHPLC-QTOF approach.

Lipids and metabolites identification and statistics

Raw data were analyzed using MSDIAL ver.4.9.221218 Windowsx6434 (Key MS-DIAL parameters: Peak detection parameters: min peak height = 1000 amplitude, mass slice width = 0.05 Da; MS2Dec: sigma window value = 0.5, MS/MS abundance cutoff = 10 amplitude; identification: accurate mass tolerance = 0.005 Da, identification score cutoff = 80%; Alignment: RT tolerance = 0.05 min, MS1 tolerance = 0.005 Da;), the raw dataset comprised of four groups (Healthy, Covid-19, QC and Blanks), were firstly normalized and batch corrected based on the QC sample pools using the LOESS algorithm35 and Internal standards workflows of the MSDIAL. Afterwards. Processed data were then filtered for high-quality peaks based on the 2-Way ANOVA p-value score and their RSD (Relative Standard Deviation) values. Lipids were identified using MSDIAL via the integrated Lipidblast36 package, the identification of the lipids was pursued in both negative and positive modes, we ensured choosing the right modifier type in MSDIAL Lipidblast MSP file tab and also we ticked all the possible adducts available in the adducts tab. Using this workflow, we were able to detect a total of 3195 features, of which 2067 with MS2 spectra and only 1095 with reference spectra in Lipidblast. In positive mode, we were able to detect a total of 4637 features, of which 2717 with MS2 spectra and only 1318 with reference spectra in Lipidblast. Metabolites were identified using MSDIAL ver.4.9.221218 Windowsx6434 via the spectral database36 package “ESI(+)-MS/MS from standards + bio + in silico (16,995 unique compounds), last edit 21.08.2022”. When peak annotation was not possible using the included spectra library, we used HMDB37, METLIN Gen238 (purchased 20.01.2023). The identification of the metabolites was pursued in positive modes, we ensured choosing the right modifier type in MSDIAL MSP file tab and, we ticked all the possible adducts available in the adducts tab. Using this workflow, we were able to detect a total of 3041 features, of which 1982 with MS2 spectra and only 175 with reference spectra. Finally, we manually curated: ms2 spectrum match based on top dot scores (>0.75) and adduct and duplicate removal) the identification hits and combined them into a single list (782 molecules) as shown in the descriptive file in our repository (identification peaklists.xls)31. Also, we included HMDB, KEGG, PubChem, ChEBI, METLIN, SMILES identifiers. We uploaded these identifiers in a separate CSV file called “HMDB_KEGG,PubChem,ChEBI,METLIN,SMILES -ID.csv” found in the others tab in the repository31.

It is worth mentioning, that although we strived to employ state-of-the-art libraries (as mentioned above) for lipids and metabolites and made every effort to carefully annotate our provided data, it is important to clarify that we do not assert the comprehensiveness or complete validity of our identification or annotation. These objectives were not the primary focus of our current study. Instead, our primary goal was to comprehensively acquire lipid and metabolite data in a format conducive to enhanced and more valid annotations through future tools and algorithms.

Data Records

All data are uploaded to the Center for Computational Mass Spectrometry: MASSIVE data repository. The dataset can be accessed over the identifier number MSV00009288731, or over the link: https://doi.org/10.25345/C5V40K90Q. All the spectra were uploaded in their native form (sciex Wiff2 file) and also as mzXML files. Moreover, we uploaded all the sequence files, result files from MS-DIAL analyses as mzdata files. Dataset folder includes following directories and subdirectories:

  1. 1.

    Metadata: f.MSV000092887/metadata/Clinical MetaData.xlsx: Includes all clinical parameters for the participants.

  2. 2.

    Peak: (mzxml files): f.MSV000092887/peak/: Includes all mzxml converted files

  3. 3.

    Raw: (sciex wiff2 files): f.MSV000092887/raw/: Includes all raw converted files

  4. 4.

    Search: f.MSV000092887/search: Includes all Ms-dial data processing parameters as well as the result siles as mzdata files.

  5. 5.

    Sequences: (Sciex OS acquisition sequence files): Includes all the sequences for the data acquisition performed in the experimental parts.

MS-DIAL settings (.med2A) Files can be opened using MS-DIAL, Also a Readme file is now also included in the data repository that explains how the repository is structured and to help reads and users to find the spectra and files.

Technical Validation

Statistical analysis

The dataset of the identified hits was fully analyzed using Metabolanalyst39,40,41. First, we performed significance analyses using volcano plotting, out of 782 identified molecules (in the descriptive file in our repository (identification peaklists.xls)31), 296 were down regulated, 70 were upregulated and the others were not significantly changed. The results of the analysis are shown in Fig. 3, a detailed table of the underlying metabolites shown in the figure is also included in the data repository31 in the others tab.

Fig. 3
figure 3

Volcano plot illustrating the differential regulation of the OMICS dataset, including metabolites and lipids. The thresholds used for significance determination are Fold Change (FC) ≥ 2 and a raw p-value of 0.05. Each point on the plot represents a unique metabolite or lipid, with those above the threshold indicating upregulation (in red) and those below indicating downregulation (in blue). For a comprehensive list of the underlying metabolites depicted in the figure, please refer to the detailed table available in the data repository31, accessible in the ‘Others’ tab. This plot provides a visual representation of the significant changes in metabolite and lipid expression, aiding in the identification of potential biomarkers or pathways associated with the studied conditions.

As evident, there are a considerable number of unknowns that we were not successfully able to annotate. These unidentified molecules, despite their regulatory relevance (Healthy vs. Covid-19), and despite efforts to annotate them using available libraries and in-silico methods, could not be correctly identified. These unannotated molecules, shown in in Fig. 3 and in the descriptive file in our repository (identification peaklists.xls)31, served as a primary motivation and driving factors for shaping the current study in this form we are presenting.

Furthermore, and to ensure the quality of the workflow, we subjected the two cohorts (Control and Covid-19) to PLSD analysis to verify the uniqueness of the identified metabolites to the relevant cohort as shown in Fig. 4. Component 1, Component 2, and Component 3 show only 18.1%, 10.0% and 2.6% overlap, which indicates that the two cohorts contain unique compounds that are independently regulated.

Fig. 4
figure 4

PLSDA score plots illustrating the distribution of samples across the first three components. Component 1, Component 2, and Component 3 account for 18.1%, 10.0%, and 2.6% of the total variance, respectively. Each point on the plot represents an individual sample, with its position determined by its score along each component. The plot provides insight into the separation and clustering of samples based on their metabolic profiles or other relevant features These PLSDA score plots offer a visual representation of the multivariate relationships within the data, facilitating the identification of relevant patterns or trends associated with the studied conditions.

Moreover, we wanted to check our identified lipidome and metabolome profiles do match the up-to-date known described profiles; hence an examination using hierarchical analysis of the top 75 up/down regulated compounds and was carried out and plotted the results using a the heatmap analysis tool in Metaboanalyst. as shown in Fig. 5. The up/down regulated compounds come in accordance with the recently described in literature data17,42,43.

Fig. 5
figure 5

Heatmap generated utilizing the Metaboanalyst platform. The heatmap provides a visual representation of the relative changes in metabolite abundance between Healthy and Covid-19 conditions, with colours ranging from red to blue indicating fold changes (FC). Red hues represent upregulated metabolites, while blue hues indicate downregulated metabolites, with intensity correlating to the magnitude of the FC. The heatmap is constructed based on the correlation between p-values and FC, allowing for the simultaneous visualization of statistical significance and biological relevance. Each row and column in the heatmap correspond to a unique metabolite, while the clustering of rows and columns enables the identification of metabolite groups or patterns associated with the experimental conditions.

Limitations of the annotation

A major known limitation of performing metabolomics and lipidomics data interpretation is how trustworthy the identification is. Usually, the quality of an identification starts with (a) a tentative annotation based on HRMS MS1 precursor ion masses with low mass drifts (±ppm). If this tentative annotation is coupled with (b) an MS2 spectra, it becomes more qualitative and trustworthy. Finally, the gold standard would be to couple these with (c) a match with an authentic analytical standard (RT and spectral match). Since the latest option “(c),” with internal standards, is very tenuous, expensive, and not realistic when trying to profile hundreds of metabolites. Most shotgun MS assays tend to base their identification workflows on HRMS MS1 “(a)” coupled to MS2 annotations “(b)” only.

In our workflow, we always had the tentative MS1 annotation with a maximum of 5 ppm drift, and we did our best to curate the data by matching the MS2 spectra to their best match. In MS-Dial, and using the Ms-find package, we were able to align the measured spectrum and match it against the possible reference library spectra, and for each spectrum alignment, there is a defined calculated dot score (0-1), with 1 being a perfect match. We strived our best to always pick the spectrum with the highest dot product. It must be clearly said that matching and generating perfect matches and annotations is not the main aim of the work, but rather to generate a well-measured dataset that could be used later to obtain better identifications. As with our usage of the up-to-date spectral libraries, this is the best we could get with manual curation of the data. That is why we included for each workflow two fragmentation energies and an “all ions” SWATH fragmentation to enhance the chances of getting a comprehensive MS2 fragmentation spectrum. We must also say that this is yet not very comprehensive, as it would require acquiring the same metabolite on different mass spectrometry architectures and different fragmentation and ionization arrangements, and then we might call the summation of all these collected spectra a comprehensive spectrum of a metabolite.

For the analytical validation of our workflow, we firstly checked for analytical variation. We hereby inspected the data before any QC samples LOESS normalization or batch correction. As shown in Fig. 6A the PCA score plot showing 2 distinct clusters Blanks vs. Others (Samples and QCs), which confirms the exclusion of any heavy carry over or high outlier possibility. By eliminating the blanks, as seen in Fig. 6B, we can observe how the QC samples cluster in relation to both groups. While not perfectly centred, which might indicate some variability and spread, we believe that considering the length of analytical batch measurements (10 days) and the frequency of sampling versus QC and blanks, as detailed in the sequence information in the repository, it still demonstrates very acceptable analytical validity.

Fig. 6
figure 6

(A) Principal Component Analysis (PCA) score plot illustrating the distribution of control (blue), Covid-19 (red), blanks (black), and quality control (QC) (green) samples in the lipidomics analysis conducted in RP positive mode. PC1, representing the primary source of variation, explains 59.8% of the total variance observed across samples, while PC2 contributes 2.7% to the overall variance. Each point on the plot represents an individual sample, with clustering indicating similarities or differences in lipidomic profiles between sample groups. (B) Another PCA score plot showcasing the distribution of control, Covid-19, and QC samples in the lipidomics analysis conducted in RP positive mode. PC1 and PC2 explain 13.4% and 4.3% of the total variance, respectively. The distinct clustering of samples based on their lipidomic profiles provides insight into the metabolic differences between healthy controls, Covid-19 patients, and quality control samples.

Moreover, to verify the stability and the reproducibility of our system, we monitored 5 internal standard lipids (18:1(d7) Lyso PE, 15:0-18:1(d7) DAG, 18:1(d7) Chol Ester, d18:1-18:1(d9) SM and 18:1(d7) Lyso PC) from the spiked SPLASHTM Lipidomics® mixture and plotted their peak area over the whole samples population as shown in Fig. 7A. As seen in figure, we show only 5 lipids representing different species of the lipidome. The SPLASHTM mixture contains 14 lipids with concentrations ranging from 2ug/mL to 350 ug/mL, we spiked only 10uL which is then were diluted as explained in the extraction sections above. With this dilution it is not possible to detect all lipids in the SPLASHTM mixture, and we purposely did not inject more so avoid possible ion suppression. Hence, all shown 5 deuterated standards show a relatively stable response (in terms of peak area) which indicated batch, system and performance stability and reproducibility.

Fig. 7
figure 7

(A) Quality control assessment of Sciex QTOF performance using selected deuterated lipids (18:1(d7) Lyso PE, 15:0-18:1(d7) DAG, 18:1(d7) Cholesterol Ester, d18:1-18:1(d9) Sphingomyelin, and 18:1(d7) Lyso PC) from the SPLASHTM Lipidomix®. The plots display the reproducibility of peak areas across all samples, including healthy control participants (nrCL) and Covid-19 patients (nrFL), ensuring consistent instrument performance and data reliability throughout the study. (B) Evaluation of LC-MS run quality, focusing on minimal carryover observed in blank samples analysed across the study. Total ion chromatograms (TIC) plots of QC samples (red lines) and blank samples (black lines) illustrate the absence of significant contamination or interference, validating the robustness of the LC-MS system for lipidomics analysis.

Lastly, we investigated carryover possibilities by overlaying, as shown in Fig. 7B, all “for all the batches” measured total ion chromatograms of blanks” in black” vs. all cohort QCs “in red. We detected very minimum (no significant) carry-over effect, that could affect the quality of the data through our workflow.

Study design limitations

Given that the study was conducted during the COVID-19 pandemic times, we were unable to collect certain important and vital parameters such as BMI, disease severity, symptoms, medications received, and other potential confounding factors. Although we acknowledge the significance of these parameters; however, unfortunately, they were not collected during the study period due to administrative and technical limitations.

Usage Notes

The dataset is available under Public Domain Dedication usage licence [dataset license: CC0 1.0 Universal (CC0 1.0)].