Abstract
Metabolomics is a powerful molecular phenotyping technology which can be used in population studies to identify metabolites underlying disease conditions. To identify plasma biomarkers potentially predicting chronic diseases we applied 1H nuclear magnetic resonance (NMR) metabolomics using a 600 MHz spectrometer fitted with an In Vitro Diagnostics Research (IVDr) platform to test associations between 18 known metabolites and 111 lipoprotein constituents that could be quantified and passed our quality control procedure and 944 phenotypes determined in 302 healthy participants of the Japanese Nagahama Study. We identified 907 statistically significant associations (p < 4.11 × 10–7) between 34 phenotypes and at least one metabolite or lipoprotein. Eight metabolites and 109 lipoprotein (sub)classes showed evidence of associations with phenotypes predominantly related to lipid and cholesterol metabolism, liver function, fatness and hematology. We confirmed previously reported associations between plasma trimethylamine-N-oxide (TMAO) and cholesterol, and between the branched-chain amino acids leucine and valine and body mass index (BMI). BMI and fatness were positively associated with components of plasma LDL-4 and VLDL-1 and the ratios of apolipoproteins A1 to B100 and LDL to HDL cholesterol, whereas they were inversely associated with HDL-1 constituents. HDL-1 and LDL-4 subclasses systematically follow the patterns of association of HDL and LDL, respectively, and we propose that these can be examined to improve cardiometabolic risk evaluation. Results from our study exemplify the power of quantitative NMR-based metabolome profiling applied to even relatively small cohorts of healthy individuals extensively characterized for multiple phenotypes underlying unrelated clinical conditions to identify potentially disease-predicting metabolite biomarkers.
Similar content being viewed by others
Introduction
Disease associated biomarker discovery has progressed with the development of high-density dataset generation genomic technologies. Metabolomics is a powerful molecular phenotyping technology based on mass spectrometry or nuclear magnetic resonance (NMR) spectroscopy, allowing metabolic profiling of biological specimens. Qualitative and quantitative information can be obtained on a wide range of metabolites present in biofluids and organ extracts in order to define patterns of metabolic regulation in health and disease conditions1.
Phenome-wide association study (PheWAS), the reversal of the GWAS paradigm, systematically investigates the association between a biological target and a broad range of phenotypes2. Adapting the PheWAS concept enables the discovery of previously unrecognized shared mechanisms and disease etiology3 as well as the identification of disease biomarkers4. In contrast to association studies of metabolome-derived molecular profiles and binary disease endpoints5, integrating phenome data of quantitative phenotypes from epidemiological studies enables a deeper understanding of the underlying pathophysiological mechanisms3. Associations with relevant human intermediate phenotypes of disease conditions in healthy individuals may allow the detection of biomarkers of early disease manifestations and point to potential disease predictive metabolite markers.
The power of quantitative 1H NMR spectroscopy with human serum lies in the simultaneous analysis of both small molecular weight metabolites and components of lipoproteins (sub)classes in a single biological specimen. Lipoproteins are important for lipid metabolism and transportation in the bloodstream. They are composed of a central core of cholesterol esters and triglycerides, surrounded by free cholesterol, phospholipids and apolipoproteins6. Lipoproteins are classified into subclasses (chylomicrons, chylomicron remnants, VLDL, IDL, LDL, HDL, and Lp (a)) based on size, density and the composition of apolipoprotein and lipids6. There is compelling evidence for the role of lipoproteins in cardiovascular and cardiometabolic diseases7,8 as well as in other pathologies, including neurodegenerative diseases9 and cancer10. Quantitative information obtained from a wide range of metabolic markers in biofluids may help define patterns of metabolic regulation in health and disease conditions.
The Nagahama Study population is a unique community-based prospective cohort study of generally healthy individuals conducted in Nagahama in Shiga Prefecture, Japan11. We took advantage of the wide range of biological and lifestyle-related phenotypes relevant to chronic diseases and health status and the analytical power of 1H NMR spectroscopy allowing quantitative measurement of 28 small molecular weight metabolites and 112 lipoprotein (sub)classes in plasma samples of 302 randomly selected individuals from this cohort (106 men and 196 women) study and examined their associations with 944 biochemical (e.g. creatinine), behavioral (e.g. food preference) and pathophysiological (e.g. blood pressure, body fat) intermediate phenotypes.
Material and methods
Study subjects from the Nagahama Study
Plasma samples and phenotype data were obtained from the first follow-up of the Nagahama Prospective Cohort for the Comprehensive Human Bioscience. This community-based prospective cohort collected in Nagahama in Shiga prefecture, Japan, consists of over 11,644 middle-aged to elderly residents aged 35 to 80 in the first follow-up from 2012 to 201611. A subset of 302 subjects was randomly selected for the present metabolomic study. The main demographic, clinical and biochemical features of these individuals are shown in Table 1. The study was conducted in accordance with the guidelines of the 1975 Declaration of Helsinki. It was approved by the ethics committee of Kyoto University Graduate School of Medicine and by the Nagahama Municipal Review Board (no.278). Written informed consent was obtained from all participants.
Phenotype description
A total of 944 phenotypes consistently collected in subjects of the Nagahama cohort were used in our analyses. They cover a broad spectrum of health aspects, including self-reported disease history and lifestyle-related questionnaires, hematology and immunological indices, urinary traits, data related to the cardio-ankle vascular index (CAVI), systolic and diastolic blood pressure, spirometry, electrocardiogram (ECG), dental examinations, ophthalmological phenotypes, sleep information, cognitive tests, and athletic activity (Supplementary Table S1).
For all subjects, the time (in hours) of the last meal prior to blood sample collection was recorded. The participants were requested to indicate whether they regularly take lipid-lowering medication. A total of 54 individuals answered “yes,” and 248 stated “no.”
Sample preparation for 1H NMR analysis
Blood samples were collected on EDTA coated tubes and plasma was separated by centrifugation. All plasma samples remained stored at − 80 °C until the day of analysis, when they were thawed on ice and kept at + 4 °C throughout the sample preparation and spectroscopy processing. To enable consistency with other metabolomics studies, a commercially available pool of sterile human plasma collected on lithium heparin (NB-145726) (Clinisciences, Nanterre, France) was used as a quality control (QC) sample. NMR samples were prepared using a Gilson robot system for liquid handling with integrated temperature control. Study and QC samples were prepared as a mixture of plasma (225μL) and an equivalent volume of phosphate buffer (75 mM Na2HPO4, 2 mM NaN3, 4.6 mM sodium trimethylsilyl propionate-[2,2,3,3-2H4] (TSP) in H2O/D2O 4:1, pH 7.4 ± 0.1). Samples were transferred into 5 mm outer diameter SampleJet NMR tubes (Bruker, Rheinstetten, Germany) and then stored at 5 °C inside the SampleJet automatic sample changer until measurement (< 24 h). Experimental procedures followed the recommendations for in vitro analytical and diagnostic procedures12. QC samples were analysed after every 34 study samples across the run in order to detect possible drifts in spectral data acquisition. Prior to data acquisition, the quality of the NMR buffer was verified for contamination and for correct pH using deionized water and a solution of imidazole (1 mM), respectively.
1H NMR spectroscopy data acquisition and analytical processing
All data were acquired on a Bruker Avance III HD 600 MHz NMR spectrometer equipped with an In Vitro Diagnostics Research (IVDr) platform, a 5-mm TXI (inverse triple resonance) Z gradient probe with automated tuning and matching, and a SampleJet for refrigerated sample storage and preheating prior to automated data acquisition using Topspin 3.5pl7 and IconNMR v5.0.8 software (Bruker, Rheinstetten, Germany). To allow comparisons with other 1H NMR studies applying equivalent standardized methods, all experiments followed Bruker’s IVDr standard operational procedures12,13. The temperature was set to 310.00 K ± 0.05 and verified with a 4% methanol in 99.8% methanol-d4 sample. The magnetic field homogeneity and the H2O signal offset (O1) were verified with a 2 mM sucrose sample with 2 mM NaN3 and 0.5 mM sodium trimethylsilylpropanesulfonate (DSS) in 10% D2O at 40 mm filling. A third QuantRefC reference sample (H144519_01, Bruker, Rheinstetten, Germany) was used to verify the stability of the signal intensities and recalibration was performed using the pulse length–based concentration (PULCON) determination principle14 when needed.
Metabolite and lipoprotein quantification and identification
The Bruker IVDr B.I. QUANT-PS™ quantification algorithm and the B.I. LISA™ method15 were designed to quantify 41 plasma low molecular weight metabolites and lipoprotein parameters. A total of 112 lipoprotein parameters were quantified, whereas only 28 metabolites could be detected and quantified in plasma (Supplementary Tables S2 and S3).
NMR data quality check (QC) and data transformation
A principal component analysis (PCA) plot was used to evaluate data quality of QC samples (n = 7) and Nagahama samples (n = 302). For all 140 compounds, their quantities were compared in the raw data versus the natural logarithmic transposed by the Shapiro–Wilk test of normality. The compound was natural logarithmic transposed if the Shapiro–Wilk test of normality showed an increased p-value; otherwise, raw data was kept. The probabilistic density function of the transposed or raw data was estimated using Kernel Density Estimation in stats package of R (ver 3.2.2, n = 512). From the function, the vertex position \(p_{top}\) with the highest value \(h_{top}\) was extracted. We also extracted other vertices at local maxima and minima. If any local maximum \(h_{max}\) and minimum \(h_{min}\) at position \(p_{max}\) and \(p_{min}\), respectively, satisfied following three criteria: 1. \(h_{max} > 0.1 \times h_{top}\), 2. \(h_{max} > 0.9 \times h_{min}\), 3. \(p_{max} < p_{min} < p_{top}\) or \(p_{top} < p_{min} < p_{max}\), we assumed that the density was multimodally distributed. In this study, eleven non-unimodal distributed compounds (creatine, glutamate, lysine, citrate, succinate, proline, 2-aminobutyrate, 2-hydroxybutyrate, 3-hydroxybutyrate, acetoacetate, free cholesterol of the VLDL-1 subclass –V1FC) were identified and removed from PheWAS analysis (Supplementary Tables S2 and S3). The remaining 129 compounds (18 metabolites and 111 lipoproteins) were z-score and/or natural logarithmic transposed and subjected to PheWAS (Supplementary Table S2 and S3).
The raw and processed NMR data are available in the MetaboBank repository (accession: MTBKS263) of the DNA Data Bank of Japan (DDBJ) under the BioProject accession PRJDB20611 (https://ddbj.nig.ac.jp/search/entry/bioproject/PRJDB20611).
Phenome-wide association study (PheWAS)
Association tests between metabolites and lipoproteins were carried out with 944 intermediate phenotypes, which were classified into binary (71), ordinal (90) and continuous (783) traits (Fig. 1). Logistic regression was performed between binary traits and metabolites or lipoproteins adjusted by sex, age, fasting time, and the use of lipid-lowering medication. Ordinal logistic regressions were adapted for ordinal phenotypes, using the same sets of covariates as for binary traits, via the clm function from the Ordinal R Package. Continuous phenotypes were transformed by inverse-normal transformation, and linear regression was performed between each transformed continuous trait and the metabolites or lipoproteins, with sex, age, fasting hour, and lipid-lowering medication status as covariates. Based on a Bonferroni correction of multiple testing by 129 compounds with 944 phenotypes, the threshold for statistical significance that we used was p < 4.11 × 10–7. This analysis pipeline was built using R scripts (R version 4.3.1) within the Nextflow framework (version 24.04.4).
Analytical pipeline applied for conducting phenome-wide association studies (PheWAS) in the Nagahama Study. A total of 129 1H NMR-measured compounds in plasma samples from 302 participants of the Nagahama study that passed quality check (QC) were used. The covariates adjusted in the regressions were sex, age, fasting time, and lipid-lowering medication status. This analysis pipeline was built with R scripts in the Nextflow framework.
Least absolute shrinkage and selection operator (LASSO) regression
LASSO regression was applied to select the contributing lipoproteins and/or metabolites for phenotypes showing more than one association with a metabolomic variable. From the PheWAS analysis, only continuous phenotypes showed evidence of statistical significance. Therefore, LASSO was applied to these 34 continuous traits. Each of these phenotypes was regressed on associated compounds as explanatory variables using LASSO. In addition, sex, age, fasting time, and the lipid-lowering medication status were also included as explanatory variables. To ensure reproducibility, we set the random seed using set.seed(123) in R. The hyperparameter lambda was determined by the mode of estimated lambda in tenfold cross-validation with 30 iterations.
Results
Clinical and biochemical features of individuals from the Nagahama Study
Clinical and biochemical data in the 302 subjects selected for the metabolome study population are shown in Table 1. The average age of the subjects was 57.5 ± 0.7 years. There were 106 males and 196 females. All individuals were characterized for body weight (65.45 ± 0.99 kg in males and 52.34 ± 0.56 kg in females, p = 3.91 × 10–29), body mass index (BMI) (22.98 ± 0.30 kg/m2 in males and 21.45 ± 0.21 kg/m2 in females, p = 3.40 × 10–5) and body fat percentage (20.20 ± 0.68%in males and 27.21 ± 0.46% in females, p = 1.62 × 10–16). Significant sex differences were also noted in markers of liver function alanine aminotransferase (ALT), aspartate aminotransferase (AST), and gamma-glutamyl transferase (GGT)—as well as in triglycerides, and total and HDL cholesterol. Individuals self-reported family history of hypertension (n = 72, 23.84%), type 2 diabetes mellitus (n = 11, 3.64%) and hyperlipidemia (n = 48, 15.89%).
General patterns of regulation of plasma metabolites and lipoproteins in the Nagahama Study population
Targeted metabolomic data derived through the most recent In Vitro Diagnostics Research(IVDr) B.I. QUANT-PS™ Bruker algorithm enabled quantitative analysis of 28 small molecular weight metabolites and a comprehensive range of 112 lipoprotein-derived variables defined using the B.I. LISA™ platform15, including total plasma triglycerides, cholesterol, apolipoprotein-A1 (Apo-A1), apolipoprotein-A2 (Apo-A2), apolipoprotein-B100 (Apo-B100), very low-density lipoprotein (VLDL), low-density lipoprotein (LDL), intermediate-density lipoprotein (IDL) and high-density lipoprotein (HDL), as well as individual molecular constituents (triglycerides, cholesterol, free cholesterol, phospholipids, and apolipoprotein) for four subclasses of HDL (HDL-1, HDL-2, HDL-3, HDL-4), six subclasses of LDL (LDL-1, LDL-2, LDL-3, LDL-4, LDL-5, LDL-6) and five subclasses of VLDL (VLDL-1, VLDL-2, VLDL-3, VLDL-4, VLDL-5) (Supplementary Table S3). The numbering of the HDL, LDL and VLDL subclasses corresponds to their increasing density and decreasing size.
Close clustering of quality control (QC) samples in the Principal Component Analysis (PCA) demonstrated the high quality of both NMR runs and data acquisition (Supplementary Figure S1A), and the sample did not show batch effect in different measurement plates (Supplementary Figure S1B). PCA also underlined the undistorted dispersion of sample data, which illustrated the metabolic variability of healthy subjects selected in our study. Eighteen metabolites and 111 lipoproteins passed the data QC procedure and were subjected to PheWAS (Supplementary Tables S2 and S3).
General summary of 1H NMR metabolome-based PheWAS
Results from the PheWAS in subjects of the Nagahama Study identified a total of 907 statistically significant associations (p < 4.11 × 10–7) between at least one metabolite or lipoprotein (sub)class and 34 phenotypes from five categories (only 3.60% of the 944 phenotypes) (Fig. 2A and B). In terms of metabolites, 24 significant metabolite-phenotype associations were found (Supplementary Table S4), whereas for lipoproteins, 883 lipoprotein-phenotype associations were identified (Supplementary Table S5). We did not detect significant associations among phenotype categories relevant to ECG, carotid ultrasound, central blood pressure, cognition, locomotion, ophthalmology, questionnaire, urine tests, and QT. The blood test measurement on lipoproteins had the greatest number of associations, including total cholesterol (80 significant associations), LDL cholesterol (83 significant associations), HDL cholesterol (72 significant associations) and triglycerides (71 significant associations) (Fig. 2A and B). Intermediate phenotypes relevant to physical measurement (i.e. the Inbody traits) also showed a high number of statistically significant associations with metabolites and lipoprotein (sub)classes (up to 48 associations). Other significantly associated phenotype categories included sleep, respiratory traits, cardio-ankle vascular index (CAVI) (Fig. 2B). Application of the LASSO regression allowed an overall 40.90% (371/907) reduction of the significant associations and a selection of the most contributing metabolites and lipoproteins (Supplementary Table S6).
Overview of statistically significant associations between plasma metabolites and lipoproteins and phenotypes characterized in the Nagahama Study population. 1H NMR metabolomics was used to quantify 28 metabolites and 112 lipoprotein classes and subclasses. A total of 18 metabolites and 111 lipoproteins passed quality check (QC) and were used to identify 907 statistically significant associations (p < 4.11 × 10–7) with intermediate phenotypes of chronic diseases characterized in 302 subjects of the Nagahama Study population. The percentage of associations with phenotype categories (A) and the number of associated compounds for each phenotype (B) are shown. In Figure (A), each label indicates the name of the measurement category, the percentage of metabolite-phenotype associations among the 907 significant associations, and the association count. BMI, Body Mass Index; CAVI, Cardio-Ankle Vascular Index; DBP, Diastolic Blood Pressure; FIVC, Forced Inspiratory Vital Capacity; R-, right; SpO2, blood oxygen saturation levels during sleep.
Identified associations between phenotypes and small molecular weight metabolites
We identified a total of 24 associations between eight small molecular weight metabolites and 20 phenotypes (Fig. 3A, Supplementary Table S4). No statistically significant association was identified for 10 metabolites. Besides the expected strongest association between plasma creatinine concentration derived by NMR metabolomics and biochemistry (i.e. blood test) (β = 0.60, 95% CI: [0.52, 0.68], p = 4.04 × 10–36), the most statistically significant associations were obtained between acetone and plasma free fatty acids (β = 0.60, 95% CI: [0.51, 0.68], p = 1.96 × 10–32), and between glutamine and the marker of liver function GGT (β = -0.35, 95% CI: [− 0.44, − 0.25], p = 6.68 × 10–12) (Fig. 3A, Supplementary Table S4). Whilst leucine and valine showed concordant evidence of significant associations with increased body weight, BMI and cholinesterase, leucine was more specifically associated with phenotypes relevant to muscle mass whereas valine was associated with several phenotypes directly relevant to fatness indicated by six body fat mass traits (Fig. 3B and C, Supplementary Table S4). Insulin was also positively associated with alanine (β = 0.31, 95% CI: [0.20, 0.42], p = 2.38 × 10–8) and lactate (β = 0.28, 95% CI: [0.18, 0.39], p = 2.35 × 10–7). Finally, we identified inverse associations between Trimethylamine N-oxide (TMAO) and biochemically determined total cholesterol (β = − 0.32, 95% CI: [− 0.43, − 0.21], p = 1.34 × 10–8) and LDL cholesterol (β = − 0.38, 95% CI: [− 0.49, − 0.27], p = 8.89 × 10–11).
Phenome-wide significant associations between plasma small molecular weight metabolites and phenotypes in the Nagahama Study population. Plasma concentration of metabolites acquired by 1H NMR metabolomics were tested for evidence of statistically significant association (p < 4.11 × 10–7) with the 944 intermediate phenotypes of chronic diseases in a subset of subjects of the Nagahama Study population (n = 302). Forest plots show the significant associations for each metabolite (A). Manhattan plots illustrate associations of leucine (B) and valine (C) across the 944 phenotypes characterized in the Nagahama Study population. Symbols indicate positive (filled upward triangle) or inverse (filled downward triangle) associations between the metabolites and the phenotypes. Significant associations are labelled with phenotype names and showed by filled triangles. Details of statistics of association between plasma metabolites and phenotypes are shown in Supplementary Table S4. Abd. Cir, Abdominal Circumference; BFM, Body Fat Mass; BMI, Body mass index; CI, Confidence Interval; L-, left; R-, right.
Plasma NMR metabolomics in the subset of healthy individuals of the Nagahama population allowed the identification of significant associations between small molecular weight metabolites and several intermediate phenotypes relevant to cardiometabolic risk (e.g. elevated BMI and fatness). We found only two instances of statistically significant association between a small molecular weight metabolite (TMAO) and plasma lipoprotein variables determined by biochemical method (Total and LDL cholesterol) (Fig. 3A, Supplementary Table S4). Applying LASSO regression, 21 metabolite-phenotype associations remained significant (Supplementary Table S6).
Characterization of circulating lipoproteins from blood assays via NMR-based metabolomic dissection of lipoproteins in subclasses
As expected, the vast majority of constituents of the lipoprotein classes and subclasses in the NMR analyses showed evidence of statistically significant association with biochemically-derived total cholesterol, HDL cholesterol, LDL cholesterol and triglycerides (Supplementary Table S5). There were, however, a few exceptions, including, for instance, the lack of associations between components of the LDL-6 and LDL cholesterol and between constituents of the VLDL-1 subclass, HDL-1, HDL-2 subclasses and total cholesterol.
Among LDL (sub)classes, components of the LDL-2 subclass except triglycerides, and only free cholesterol in the LDL-1, LDL-3 and LDL-6 subclasses were positively associated with plasma HDL cholesterol determined by biochemistry (Fig. 4A). Triglycerides in the HDL-3 and -4 subclasses and most of the components of the IDL class and VLDL class and subclasses, LDL-4 and LDL-5 subclasses were systematically inversely associated with HDL cholesterol (Fig. 4A, Supplementary Table S5). As expected, biochemically-derived plasma LDL cholesterol was positively associated with all LDL (sub)classes (excluding triglycerides in LDL-5 subclass), and also with IDL and VLDL-2, VLDL-3, VLDL-4 and VLDL-5 subclasses (Fig. 4B). Among the HDL components, only free cholesterol in the HDL-3 and HDL-4 subclasses were positively associated with total cholesterol (Supplementary Figure S2A). Biochemically determined plasma triglycerides were mostly associated with IDL, all VLDL (sub)classes and most of the constituents of LDL subclasses 2 to 5, but constituents of the LDL-2 and LDL-3 subclasses showed opposite trends of association with plasma triglycerides to those of the LDL-4 and LDL-5 subclasses (Supplementary Figure S2B).
Association analyses between biochemically determined plasma HDL and LDL cholesterol in the Nagahama Study population and metabolites and lipoprotein (sub)classes. 1H NMR spectroscopy was used to derive quantitative data for small molecular weight metabolites and lipoprotein classes and subclasses which were tested for association with biochemically determined plasma HDL (A) and LDL (B) cholesterol in 302 subjects of the Nagahama Study population. The inner circle displays the -log10(p-value) of each association, and significant associations are shown in orange dots (–Log10(p-value) > 6.40) and lack of association in black dots. The outer circle represents the regression coefficient as beta values for NMR measured compounds associated positively (red dots) or negatively (blue dots) with HDL or LDL cholesterol. The nomenclature used for each lipoprotein (sub)class is given in Supplementary Table S3. Details of statistics of associations between NMR-derived compounds and HDL and LDL cholesterol are given in Supplementary Tables S4 and S5.
Plasma total and LDL cholesterol were positively associated with Apo-B100 in IDL and VLDL classes and in LDL-1, LDL-2, LDL-3, LDL-4 and LDL-5 subclasses (Fig. 4B, Supplementary Figure S2A, Supplementary Table S5). In contrast, plasma HDL cholesterol was inversely associated with Apo-B100 in the IDL and VLDL classes and in LDL-4 and LDL-5 subclasses (Fig. 4A). Apo-A1 and A2 in the HDL-1 subclass were inversely associated with LDL cholesterol, whereas they showed the opposite trend of association with the HDL-4 subclass (Fig. 4B).
These results highlight the heterogeneous lipid composition of lipoproteins and underline the importance of dissecting out lipoprotein classes in independent molecular components of lipoprotein subclasses to accurately account for the complex regulation of lipoprotein metabolism.
Targeted metabolomics identifies associations between lipoprotein (sub)classes and pathophysiological intermediate phenotypes
In addition to the four biochemically derived lipid measurements, we identified 579 lipoprotein-phenotype associations derived from 27 intermediate phenotypes relevant to obesity (i.e. BMI, abdominal circumference, fatness), liver function and hematological traits and 63 lipoprotein (sub)classes (Fig. 5, Supplementary Table S5). Body weight, BMI, abdominal circumference, and intermediate phenotypes directly relevant to fatness indicated by the body fat percentage and mass were consistently positively associated with components of the IDL and VLDL classes and the LDL-4 subclass (particle number, ApoB-100, cholesterol, free cholesterol, phospholipids), the VLDL-1 subclass (cholesterol, phospholipids, triglycerides), the ratios of ApoA-1 to ApoB-100 and LDL to HDL cholesterol, and total plasma triglycerides (Fig. 5, Supplementary Table S5). In contrast, they showed the opposite trend of association with components of the HDL class (Apo-A1, cholesterol, free cholesterol, phospholipids) and HDL-1 subclass (Apo-A1 and Apo-A2, cholesterol, free cholesterol, phospholipids). In addition, visceral fat, body fat, abdominal circumference, body weight and BMI were associated with free cholesterol of VLDL-2, VLDL-3 and VLDL-4 subclasses and phospholipids of VLDL-3 (Fig. 5, Supplementary Table S5). As expected, associations to lipoprotein (sub)classes were generally conserved for BMI, abdominal circumference and visceral fat (Fig. 6, Supplementary Figure S3, Supplementary Table S5). Deeper analysis showed that triglycerides in HDL-3 (sub)classes were associated with abdominal circumference as well as torso fat mass, but were not associated with BMI or other intermediate phenotypes of fatness (Supplementary Figure S3). In contrast, triglycerides in the HDL-4 subclasses were positively associated specifically with BMI, abdominal circumference and body weight. Total plasma Apo-A1 and Apo-A1 in HDL and HDL-2 (sub)classes were systematically negatively associated specifically with all phenotypes directly relevant to fatness, but did not show evidence of statistically significant association with BMI or abdominal circumference or body weight (Supplementary Figure S3).
Overview of associations between phenotypes in the Nagahama Study population and lipoprotein (sub)classes. 1H NMR spectroscopy was used to derive quantitative data for lipoprotein classes and subclasses which were tested for association with intermediate phenotypes relevant to chronic diseases in 302 subjects of the Nagahama Study population. Number of statistically significant associations (p < 4.11 × 10–7) between phenotypes of each category and corresponding lipoprotein (sub)class are given. White rectangles indicate non-significant and coloured ones denote significant associations. The histograms show the number of significant associations for each phenotype or lipoprotein. The nomenclature used for each lipoprotein (sub)class is given in Supplementary Table S3. Details of statistics of associations between NMR-derived lipoprotein (sub)classes and phenotypes are given in Supplementary Table S5. BMI, Body Mass Index; CAVI, Cardio-Ankle Vascular Index; DBP, Diastolic Blood Pressure; FIVC, Forced Inspiratory Vital Capacity; R-, right; SpO2, blood oxygen saturation levels during sleep.
Association analyses between obesity phenotypes in the Nagahama Study population and metabolites and lipoprotein (sub)classes. 1H NMR spectroscopy was used to derive quantitative data for small molecular weight metabolites and lipoprotein classes and subclasses which were tested for association with body mass index (BMI) (A) and visceral fat (B) in 302 subjects of the Nagahama Study population. The inner circle displays the -log10(p-value) of each association, and significant associations are shown in orange dots (–Log10(p-value) > 6.40) and lack of association in black dots. The outer circle represents the regression coefficient as beta values for NMR measured compounds associated positively (red dots) or negatively (blue dots) with BMI or visceral fat. Forest plots show the significant associations between lipoprotein (sub)classes and BMI and visceral fat (C). The nomenclature used for each lipoprotein (sub)class is given in Supplementary Table S3. Details of the statistics of associations between NMR-derived compounds and BMI and visceral fat are given in Supplementary Tables S4 and S5.
Biological markers of liver dysfunction ALT and GGT were consistently associated with Apo-A2 in the HDL-3, whereas GGT was more specifically associated with triglycerides in the VLDL and IDL classes and both triglycerides and phospholipids in the VLDL-1 subclass (Table 2). Choline esterase, another marker of liver function, was mostly associated with constituents of IDL, VLDL, LDL-4 and LDL-5 (sub)classes, Apo-A1 and Apo-A-2 of the HDL-1 and HDL-4 subclasses, and cholesterol, free cholesterol, phospholipids and triglycerides of VLDL-3 and VLDL-4 subclasses (Supplementary Table S5). Plasma insulin and phenotypes relevant to sleep quality (Mean SpO2) also showed evidence of specific associations with constituents VLDL-1 subclass and HDL-1 subclass.
Three hematological phenotypes (red blood cell counts, hemoglobin and hematocrit) showed 54 significant associations with lipoprotein levels. Red blood cell counts were associated with 34 lipoprotein levels, such as components of the IDL (excluding triglycerides) and VLDL classes (excluding phospholipids), HDL-1 subclass (Apo-A1, Apo-A2, cholesterol, phospholipids), LDL-4 subclass (Apo-B100, cholesterol, free cholesterol, phospholipids, particle number), and LDL-5 subclass (Apo-B100, cholesterol, phospholipids, particle number). Hemoglobin and the hematocrit were consistently associated with cholesterol and free cholesterol of the IDL class, with constituents of the LDL-4 subclass (Apo-B100, cholesterol, free cholesterol, phospholipids, particle number) and with cholesterol and phospholipids of the LDL-5 subclass (Supplementary Table S5). Diastolic blood pressure also showed specific associations with components of the LDL-4 subclass.
Applying LASSO regression on BMI and visceral fat, out of the two metabolites and 39 lipoprotein constituents significantly associated with BMI (Supplementary Tables S4 and S5), only 10 lipoproteins, plus valine and leucine, mainly contribute, corresponding to a 70.73% (29/41) reduction, whereas the reduction was only by 47.22% (17/36) for the association between lipoproteins and visceral fat level (Supplementary Table S6).
Distinct lipoprotein (sub)classes share associations with multiple distinct phenotypes
In several instances, we identified consistent associations between apparently unrelated intermediate phenotypes and the same component in different lipoprotein (sub)classes. This is illustrated with Apo-B100 in the VLDL and IDL classes and the LDL-4 subclass, which were consistently associated with body weight, BMI, variables related to fatness, total, HDL and LDL cholesterol, triglycerides, cholinesterase and red blood cell count (Fig. 7, Supplementary Figure S4). Apo-B100 in the IDL class was specifically associated with forced inspiratory vital capacity (FIVC), whereas it was associated with hematocrit, hemoglobin, plasma insulin and diastolic blood pressure in the LDL-4 subclass. These findings underline the importance of extensive phenotype characterization of individuals in metabolome-wide association studies to identify sets of molecular markers underlying biological and behavioral mechanisms of potential distinct etiology. As above mentioned, LASSO regression resulted in an 66.67% (14/21) reduction in the number of significant associations between phenotypes and Apo-B100 in the LDL-4 subclass (L4AB), only a modest 27.78% (5/18) reduction for Apo-B100 in the IDL class (IDAB), and a mild 22.22% (4/18) reduction for Apo-B100 in the VLDL class (VLAB) (Supplementary Table S6).
Phenotypes associated with Apo-B100 in the VLDL and IDL classes and the LDL-4 subclass in the Nagahama Study. Plasma concentrations of Apo-B100 in VLDL class (A), IDL class (B), and LDL-4 subclass (C) were acquired following 1H NMR metabolomic profiling and tested for evidence of association with phenotypes characterized in a subset of subjects of the Nagahama Study population (n = 302). The threshold of statistical significance of association (p < 4.11 × 10–7) was determined following the Bonferroni correction. Symbols indicate positive (filled upward triangle) or inverse (filled downward triangle) associations between the metabolites and the phenotypes. Significant associations are labelled with phenotype names and shown by filled triangles. Abd. Cir, Abdominal Circumference; BFM, Body Fat Mass; BMI, Body mass index; DBP, Diastolic Blood Pressure; FIVC, Forced Inspiratory Vital Capacity; L-, Left; R-, Right. Details of the statistics of associations are given in Supplementary Table S5 and illustrated in Supplementary Figure S3.
These results illustrate the power of quantitative NMR serum spectroscopy to investigate the involvement of small molecular weight metabolites and distinct molecular components of lipoprotein classes and subclasses simultaneously in PheWAS, in order to deepen association analyses with multiple intermediate phenotypes, including clinical and biochemical phenotypes relevant to obesity and liver dysfunction.
Discussion
We report results from a 1H NMR plasma metabolite and lipoprotein-based PheWAS in a population of generally healthy individuals extensively characterized for a large number of clinical, biochemical and behavioral variables, which identified a series of associations between intermediate phenotypes of chronic diseases and small molecular weight metabolites and constituents of lipoprotein (sub)classes. Considering the depth of phenotype characterization of subjects of the Nagahama Study, they provide a comprehensive set of biomarkers predicting the risk of several chronic diseases, as well as targets for therapeutic applications.
With progress in increasing the throughput and number of small molecular weight metabolites and lipoproteins that can be simultaneously quantified in a single analysis, metabolomic studies have been successfully used in molecular epidemiology16, covering applications to a wide range of disease conditions17,18,19. Extensive phenotype characterization in generally healthy individuals of the Nagahama Study population makes it suitable to apply a phenome-wide approach to test for association between metabolites and lipoproteins and multiple phenotypes and identify diseased-related biomarkers. The small number of associations between metabolites and phenotypes may be explained by the limited number of metabolites analyzed with the IVDr NMR approach and by potentially important fluctuations of metabolite concentration in plasma collected from subjects who were not systematically fasted. However, we were able to detect associations for metabolites previously proposed as disease markers. For example, association between TMAO and total and LDL cholesterol in our study concurs with conclusions of a prospective study which showed that elevated plasma TMAO in healthy subjects predicts the risk of cardiovascular disease20. Again, positive associations between the branched-chain amino acids (BCAAs) valine and leucine with intermediate phenotypes underlying cardiometabolic risk (e.g. BMI, fatness, lipid metabolism) accord with the literature21. However, the depth of phenotype characterization of subjects in our study allowed the detection of distinctive associations to the two BCAAs: valine was associated with BMI and fatness, and leucine with BMI, muscle mass, red blood cell count, and hematocrit. We also found strong associations between BMI and intermediate phenotypes of fatness and lipoproteins classes and molecular components (triglycerides, Apo-A1, Apo-B100), which confirms previous observations in Mexican subjects of negative association between adiposity and Apo-A1 and positive associations between adiposity and Apo-A1, Apo-B and VLDL, which is a triglycerides-rich lipoprotein22.
The power of metabolome-based PheWAS lies in quantitative analysis of a wide range of metabolites that are not necessarily involved in the same biological processes but respond to the same biological stimulus or pathophysiological condition and may, therefore, be coordinately regulated (metabotypes). The 1H NMR plasma spectroscopy method used in our study allowed us to quantify in a single analysis a series of plasma small molecular weight metabolites and lipoprotein (sub)classes, which, when combined, can define metabotypes specifically associated with phenotypes characterized in the study population and can potentially provide improved tools for disease diagnosis. Based on the compounds analyzed by NMR metabolomics in our study, we propose that a metabotype composed of glutamine, Apo-A2 of the HDL-3 subclass, triglycerides of the IDL, VLDL and VLDL-1 (sub)class and phospholipids of the VLDL-1 subclass is a novel molecular marker of liver dysfunction, which we assessed by measures of plasma ALT and GGT. Along the same line, a metabotype for BMI variation includes the BCAAs leucine and valine, and most predominantly components of the HDL-1, LDL-4, VLDL-1, VLDL-3 and VLDL-4 subclasses.
Our results highlight the diversity of associations between phenotypes and specific classes of lipoproteins, which may reflect their particular molecular composition and function. Triglycerides are the predominant components of VLDL, whereas phospholipids and cholesterol are the major constituents of LDL and HDL. Variations in concentrations of specific lipoproteins are associated with risk of several disease conditions, including cardiometabolic diseases and neurodegenerative disorders. Strong evidence from the literature indicates that elevated plasma LDL cholesterol increases the risk of atherosclerotic cardiovascular disease, whereas plasma HDL cholesterol reduces the formation of atherosclerotic plaques23. Our results showing that BMI and all intermediate phenotypes for fatness associate negatively with HDL and positively with IDL, LDL, VLDL, and the ratio LDL to HDL agree with this concept and with published metabolomics data in healthy individuals24. Knowledge of the role of lipoproteins as markers of disease risk, which was initially based on classes, is progressing with deeper profiling of subclasses enabled by NMR metabolome profiling. The importance of quantitative analysis of lipoprotein subclasses is illustrated by the identification of specific associations between low plasma HDL-3 and LDL-5 subclasses and microvascular health25 and severe coronary artery disease26, between low LDL-2 components and macrovascular dysfunction25, between elevated serum HDL-4 and LDL-2 and Alzheimer’s disease (AD), and between serum VLDL-1 and IDL and cognitive impairment and AD27. Results from our NMR analysis provide additional information indicating that components of the HDL-1, LDL-4 and VLDL-1, VLDL-3 and VLDL-4 subclasses are the main contributors to the associations with BMI and fatness. These findings are in line with NMR metabolomic profiling data in UK BioBank, which showed that smaller VLDL subclasses (i.e. VLDL-3 and VLDL-4 in our study) account for the strongest associations to CHD risk28.
Limitations of our study
Despite the comprehensive characterization of individuals in the Nagahama Study with multiple phenotypes and extensive metabolome analyses, we were unable to identify series of associations between NMR quantitative data and phenotypes such as those related to cardiovascular function, physical movement, dental health and ophthalmology. The low variability of these phenotypes in generally healthy individuals and the relatively modest sample size of the population used in our study may prevent the detection of statistically significant associations for such traits, contrasting to the power we have shown for other more variable phenotypes. Moreover, future studies should consider additional covariates (besides sex, age, fasting time, and lipid-lowering medication status) and/or confounding factors to validate our findings. Alternative mass spectrometry-based methods applied to metabolomic analyses should allow to increase the coverage of metabolites that can be quantified in blood plasma and tested for association with phenotypes in the Nagahama study. Finally, our study presented the challenge in omics-based PheWAS for obtaining a large amount of association output. Multiple associations for a single phenotype and/or target metabolites posed obstacles to narrowing down the most biologically relevant ones. To address this issue, implementing LASSO regression successfully led to an approximately 41% reduction in the number of associations.
Conclusions
Results from our study underline the power of quantitative IVDr NMR plasma spectroscopy for systematic analysis of large series of metabolites and lipoprotein (sub)classes simultaneously to uncover associative relationships between phenotypes potentially underlying disease risk and metabolic/metabotype biomarkers even in relatively small samples if well-characterized. Dissecting out lipoproteins in subclasses in metabolome-based PheWAS may enable disease risk stratification through refined analysis of associations with intermediate phenotypes underlying various chronic diseases. The identification of associations between multiple phenotypes and candidate biomarkers combining metabolites and lipoprotein (sub)classes may eventually contribute to improving our knowledge of disease risk and characterizing therapeutic targets and advanced preventive approaches for chronic diseases. Future deeper metabolome profiling using extensive metabolite panels, refinement of associated phenotypes, and follow-up clinical explorations of the individuals involved in this study will allow further assessment of the prodromal relevance of candidate metabolites and lipoproteins.
Data availability
The raw and processed NMR data are available in the MetaboBank repository (accession: MTBKS263) of the DNA Data Bank of Japan (DDBJ) under the BioProject accession PRJDB20611 (https://ddbj.nig.ac.jp/search/entry/bioproject/PRJDB20611).
References
Beger, R. D. et al. Metabolomics enables precision medicine: “A White Paper, Community Perspective”. Metabolomics 12(10), 149 (2016).
Bush, W. S., Oetjens, M. T. & Crawford, D. C. Unravelling the human genome-phenome relationship using phenome-wide association studies. Nat. Rev. Genet. 17(3), 129–145 (2016).
Wang, L. et al. Methodology in phenome-wide association studies: a systematic review. J. Med. Genet. 58(11), 720–728 (2021).
Pendergrass, S. A. & Ritchie, M. D. Phenome-wide association studies: leveraging comprehensive phenotypic and genotypic data for discovery. Curr. Genet. Med. Rep. 3(2), 92–100 (2015).
Julkunen, H. et al. Atlas of plasma NMR biomarkers for health and disease in 118,461 individuals from the UK Biobank. Nat. Commun. 14(1), 604 (2023).
Feingold, K. R. Lipid and lipoprotein metabolism. Endocrinol. Metab. Clin. North Am. 51(3), 437–458 (2022).
Brandts, J. & Ray, K. K. Novel and future lipid-modulating therapies for the prevention of cardiovascular disease. Nat. Rev. Cardiol. 20(9), 600–616 (2023).
Li, R. et al. Associations of lipoprotein subclasses with risk of all-cause and cardiovascular disease mortality in individuals with type 2 diabetes: a prospective cohort study. Diabetes Obes. Metab. 25(11), 3259–3267 (2023).
Akyol, O. et al. Lipids and lipoproteins may play a role in the neuropathology of Alzheimer’s disease. Front. Neurosci. 17, 1275932 (2023).
Maran, L., Hamid, A. & Hamid, S. B. S. Lipoproteins as markers for monitoring cancer progression. J. Lipids. 2021, 8180424 (2021).
Setoh, K. & Matsuda, F. Cohort profile: the nagahama prospective genome cohort for comprehensive human bioscience (the nagahama study). In Socio-Life Science and the COVID-19 Outbreak: Public Health and Public Policy (eds Yano, M. et al.) 127–143 (Springer Singapore, 2022).
Dona, A. C. et al. Precision high-throughput proton NMR spectroscopy of human urine, serum, and plasma for large-scale metabolic phenotyping. Anal. Chem. 86(19), 9887–9894 (2014).
Embade, N. et al. NMR-based newborn urine screening for optimized detection of inherited errors of metabolism. Sci. Rep. 9(1), 13067 (2019).
Aygen, S. et al. NMR-based screening for inborn errors of metabolism: initial results from a study on Turkish Neonates. JIMD Rep. 16, 101–111 (2014).
Verhoeven, A. Lipoprotein (Sub)fraction analysis on the Bruker B.I. LISA Platform. Methods Mol. Biol. 2855, 445–456 (2025).
Würtz, P. et al. Quantitative serum nuclear magnetic resonance metabolomics in large-scale epidemiology: a primer on -omic technologies. Am. J. Epidemiol. 186(9), 1084–1096 (2017).
Flote, V. G. et al. Lipoprotein subfractions by nuclear magnetic resonance are associated with tumor characteristics in breast cancer. Lipids Health Dis. 15, 56 (2016).
Masoodi, M. et al. Metabolomics and lipidomics in NAFLD: biomarkers and non-invasive diagnostic tests. Nat. Rev. Gastroenterol. Hepatol. 18(12), 835–856 (2021).
Berezhnoy, G., Laske, C. & Trautwein, C. Metabolomic profiling of CSF and blood serum elucidates general and sex-specific patterns for mild cognitive impairment and Alzheimer’s disease patients. Front. Aging Neurosci. 15, 1219718 (2023).
Tang, W. H. W. et al. Plasma trimethylamine N-oxide (TMAO) levels predict future risk of coronary artery disease in apparently healthy individuals in the EPIC-Norfolk prospective population study. Am. Heart J. 236, 80–86 (2021).
Lynch, C. J. & Adams, S. H. Branched-chain amino acids in metabolic signalling and insulin resistance. Nat. Rev. Endocrinol. 10(12), 723–736 (2014).
Aguilar-Ramirez, D. et al. Adiposity and NMR-measured lipid and metabolic biomarkers among 30,000 Mexican adults. Commun. Med. (Lond). 2(1), 143 (2022).
Duncan, M. S., Vasan, R. S. & Xanthakis, V. Trajectories of blood lipid concentrations over the adult life course and risk of cardiovascular disease and all-cause mortality: observations from the framingham study over 35 years. J. Am0 Heart Assoc. 8(11), e011433 (2019).
Masuda, R. et al. Plasma lipoprotein subclass variation in middle-aged and older adults: Sex-stratified distributions and associations with health status and cardiometabolic risk factors. J. Clin. Lipidol. 17(5), 677–687 (2023).
Streese, L. et al. Lipoprotein subclasses independently contribute to subclinical variance of microvascular and macrovascular health. Molecules 27, 15 (2022).
Chaudhary, R. et al. HDL. Cardiovasc. Revasc. Med. 20(11), 1001–1006 (2019).
Berezhnoy, G., Laske, C. & Trautwein, C. Quantitative NMR-based lipoprotein analysis identifies elevated HDL-4 and triglycerides in the serum of alzheimer’s disease patients. Int. J. Mol. Sci. 23, 20 (2022).
Jin, D. et al. Lipoprotein characteristics and incident coronary heart disease: prospective cohort of nearly 90 000 individuals in UK Biobank. J. Am. Heart Assoc. 12(20), e029552 (2023).
Acknowledgements
We would like to thank all the organizing staff and the participants in the Nagahama Study.
Funding
HO is supported by the Kyoto-McGill International Collaborative Program in Genomic Medicine. FM and DG acknowledge financial support from the INSERM Projet de Recherche International DIABETOMARKERS.
Author information
Authors and Affiliations
Contributions
D.G., D.A., B.J.J.C., F.M. and M.L. conceived the project and designed the research. A.E.G. and S.K.D. performed NMR metabolomics. H.O. and F.B. carried out statistical analyses and data interpretation. K.S. and Y.T. provided assistance with metabolomic and phenotype data. S.K. and T.K. provided statistical consultation. D.G. wrote the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ou, H., Kawaguchi, S., Brial, F. et al. Quantitative 1H-NMR spectroscopy identifies metabolites and lipoprotein subclasses associated with intermediate phenotypes of chronic diseases in the Japanese Nagahama Study. Sci Rep 15, 31337 (2025). https://doi.org/10.1038/s41598-025-12305-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-12305-y