Transcripts with high distal heritability mediate genetic effects on complex metabolic traits

Tyler, Anna L.; Mahoney, J. Matthew; Keller, Mark P.; Baker, Candice N.; Gaca, Margaret; Srivastava, Anuj; Gyuricza, Isabela Gerdes; Braun, Madeleine J.; Rosenthal, Nadia A.; Attie, Alan D.; Churchill, Gary A.; Carter, Gregory W.

doi:10.1038/s41467-025-61228-9

Download PDF

Article
Open access
Published: 01 July 2025

Transcripts with high distal heritability mediate genetic effects on complex metabolic traits

Nature Communications volume 16, Article number: 5507 (2025) Cite this article

1654 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Although many genes are subject to local regulation, recent evidence suggests that complex distal regulation may be more important in mediating phenotypic variability. To assess the role of distal gene regulation in complex traits, we combine multi-tissue transcriptomes with physiological outcomes to model diet-induced obesity and metabolic disease in a population of Diversity Outbred mice. Using a novel high-dimensional mediation analysis, we identify a composite transcriptome signature that summarizes genetic effects on gene expression and explains 30% of the variation across all metabolic traits. The signature is heritable, interpretable in biological terms, and predicts obesity status from gene expression in an independently derived mouse cohort and multiple human studies. Transcripts contributing most strongly to this composite mediator frequently have complex, distal regulation distributed throughout the genome. These results suggest that trait-relevant variation in transcription is largely distally regulated, but is nonetheless identifiable, interpretable, and translatable across species.

An approach to identify gene-environment interactions and reveal new biological insight in complex traits

Article Open access 22 April 2024

Multivariate genomic analysis of 5 million people elucidates the genetic architecture of shared components of the metabolic syndrome

Article Open access 30 September 2024

Genetically regulated multi-omics study for symptom clusters of posttraumatic stress disorder highlights pleiotropy with hematologic and cardio-metabolic traits

Article 03 March 2022

Introduction

Evidence from genome-wide association studies (GWAS) suggests that most heritable variation in complex traits is mediated through regulation of gene expression. The majority of trait-associated variants lie in gene regulatory regions^{1,2,3,4,5,6,7}, suggesting a relatively simple causal model in which a variant alters the homeostatic expression level of a nearby (local) gene which, in turn, alters a trait. Statistical methods such as transcriptome-wide association studies (TWAS)^8,9,10,11 and summary data-based Mendelian randomization (SMR)¹⁰ have used this idea to identify genes associated with multiple disease traits^12,13,14,15. However, despite the great promise of these methods, explaining trait effects with local gene regulation has been more difficult than initially assumed^16,17. Although trait-associated variants typically lie in non-coding, regulatory regions, these variants often have no detectable effects on gene expression¹⁶ and tend not to co-localize with expression quantitative trait loci (eQTLs)^17,18. These observations suggest that the relationship among genetic variants, gene expression, and organism-level traits is more complex than the simple, local model.

In recent years the conversation around the genetic architecture of common disease traits has been addressing this complexity, and there is increased interest in more distant (distal) genetic effects as potential drivers of trait variation^{15,18,19,20,21}. In general, distal effects are defined as being greater than 4 or 5Mb away from the transcription start site of a given gene. We use the terms local and distal rather than cis and trans because cis and trans have specific biochemical meanings²², whereas local and distal are defined only by genomic position. The importance of distal genetic effects is proposed in the omnigenic model, which posits that trait-driving genes are cumulatively influenced by many distal variants. In this view, the heritable transcriptomic signatures driving clinical traits are an emergent state arising from the myriad molecular interactions defining and constraining gene expression. Consistent with this view, it has been suggested that part of the difficulty in explaining trait variation through local eQTLs may arise in part because gene expression is not measured in the appropriate cell types¹⁶, or cell states²³, and thus local eQTLs influencing traits cannot be detected in bulk tissue samples. This context dependence emphasizes the essential role of complex regulatory and tissue networks in mediating variant effects. The mechanistic dissection of complex traits in this model is more challenging because it requires addressing network-mediated effects that are weaker and greater in number. However, the comparative importance of distal effects over local effects is currently only conjectured and challenging to address in human populations.

To assess the role of wide-spread distal gene regulation in the genetic architecture of complex traits, we used genetically diverse mice as a model system. In mice we can obtain simultaneous measurements of the genome, transcriptome, and phenome in all individuals. We used diet-induced obesity and metabolic disease as an archetypal example of a complex trait. In humans, these phenotypes are genetically complex with hundreds of variants mapped through GWAS^24,25 that are known to act through multiple tissues^26,27. Likewise in mice, metabolic traits are also genetically complex²⁸ and synteny analysis implicates a high degree of concordance in the genetic architecture between species^12,28. Furthermore, in contrast to humans, in mice we have access to multiple disease-relevant tissues in the same individuals with sufficient numbers for adequate statistical power.

We generated two complementary data sets: a discovery data set in a large population of Diversity Outbred (DO) mice²⁹, and an independent validation data set derived by crossing inbred strains from the Collaborative Cross (CC) recombinant inbred lines³⁰ to form CC recombinant inbred intercross (CC-RIX) mice. Animals in populations were maintained on a high-fat, high-sugar diet to model diet-induced obesity and metabolic disease¹².

The DO population and CC-RIX were derived from the same eight inbred founder strains: five classical lab strains and three strains more recently derived from wild mice²⁹, representing three subspecies and capturing 90% of the known variation in laboratory mice³¹. The DO mice are maintained with a breeding scheme that ensures equal contributions from each founder across the genome, thus rendering almost the whole genome visible to genetic inquiry and maximizing power to detect eQTLs²⁹. The CC mice were initially intercrossed to recombine the genomes from all eight founders, and then inbred for at least 20 generations to create recombinant inbred lines^30,31,32. Because these two populations have common ancestral haplotypes but highly distinct kinship structure, we could directly and unambiguously compare the local genetic effects on gene expression at the whole-transcriptome level while varying the population structure driving distal regulation.

In the DO population, we paired clinically relevant metabolic traits, including body weight and plasma levels of insulin, glucose and lipids¹², with transcriptome-wide gene expression in four tissues related to metabolic disease: visceral adipose tissue (gonadal fat pad), pancreatic islets, liver, and skeletal muscle. We measured similar metabolic traits in a CC-RIX population and gene expression from three of the four tissues used in the DO: visceral adipose tissue (gonadal fat pad), liver, and skeletal muscle. Measuring gene expression in multiple tissues is critical to adequately assess the extent to which local gene regulation varies across the tissues and whether such variability might account for previous failed attempts to identify trait-relevant local eQTLs. Because the CC-RIX carry the same founder alleles as the DO, the local gene regulation is expected to match between the populations. However, because the alleles are recombined throughout the genome, distal effects are expected to vary from those in the DO, allowing us to directly assess the role of distal gene regulation in driving trait-associated transcript variation. To mechanistically dissect distal effects on metabolic disease, we developed a novel dimension reduction framework called high-dimensional mediation analysis (HDMA) to identify the heritable transcriptomic signatures driving trait variation, which we compared between mouse populations and to human data sets with measured adipose gene expression. Together, these data enable a comprehensive view into the genetic architecture of metabolic disease.

Results

Genetic variation contributed to wide phenotypic variation

Although the environment was consistent across the DO mice, the genetic diversity present in this population resulted in widely varying distributions across physiological measurements (Fig. 1). For example, body weights of adult individuals varied from less than the average adult C57BL/6J (B6) body weight to several times the body weight of a B6 adult in both sexes (Males: 18.5–69.1g, Females: 16.0–54.8g) (Fig. 1A). Fasting blood glucose (FBG) also varied considerably (Fig. 1B), although few of the animals had FBG levels that would indicate pre-diabetes (19 animals, 3.8%), or diabetes (7 animals, 1.4%) according to previously developed cutoffs (pre-diabetes: FBG ≥250 mg/dL, diabetes: FBG ≥300, mg/dL)³³. Males had higher FBG than females on average (Fig. 1C) as has been observed before suggesting either that males were more susceptible to metabolic disease on the high-fat, high-sugar (HFHS) diet, or that males and females may require different FBG thresholds for pre-diabetes and diabetes.

Body weight was strongly positively correlated with food consumption (Fig. 1D; Linear regression R² = 0.51; beta coefficient = 12.6 ± 0.57 standard error; t = 22.2; p < 2. 2⁻¹⁶) and FBG (Fig. 1E; Linear regression R² = 0.21; beta coefficient = 2.49 ± 0.22 standard error; t = 11.34; p < 2. 2⁻¹⁶) suggesting a link between behavioral factors and metabolic disease. However, the heritability of this trait and others (Fig. 1F) indicates that genetics contribute substantially to correlates of metabolic disease in this population.

The trait correlations (Fig. 1G) showed that most of the metabolic trait pairs were only modestly correlated, which, in conjunction with the trait decomposition (Supplementary Fig. 1), suggests complex relationships among the measured traits and a broad sampling of multiple heritable aspects of metabolic disease including overall body weight, glucose homeostasis, and pancreatic function.

Distal heritability correlated with phenotype relevance

It is widely assumed that variation in traits is mediated through local regulation of gene expression. To test this assumption, we measured transcriptome-wide gene expression in four tissues–adipose, liver, pancreatic islet, and skeletal muscle–in the DO cohort. (Basic results from a standard eQTL analysis³⁴ (Methods) are available in Supplementary Fig. 2). We estimated the local genetic contribution to each transcript as the variance explained by the haplotype probabilities at the genetic marker closest to the gene transcription start site. We estimated the distal heritability as the heritability of the residuals after local haplotype had been accounted for (Methods). Importantly, this estimate was not based on distal eQTL, but rather the unlocalized contribution of the genome after removing the local genetic effect.

Overall, local and distal genetic factors contributed approximately equally to transcript abundance. In all tissues, both local and distal factors explained between 8% and 17% of the variance in the median transcript (Fig. 2A). This 50% contribution of local genetic variation to transcript abundance contrasts with findings in humans in which local variants have been found to explain only 20–30% of total heritability, while distal effects explain the remaining 70–80%^35,36. This discrepancy may arise due to the high degree of linkage disequilibrium in the DO mice compared to human populations and to the high degree of confidence with which we can estimate ancestral haplotypes in this population. At each position in the mice we can estimate ancestral haplotype with a high degree of accuracy. Haplotype at any given genetic marker captures genomic information from a relatively large genomic region surrounding each marker. In contrast, there is a much higher degree of recombination in human populations and ancestral haplotypes are more numerous and more difficult to estimate than in the mice. Thus in the mice, each marker may capture more local regulatory variation than SNPs or estimated haplotypes capture in humans. It has been found that transcripts with muliple local eQTL have higher local heritability than transcripts with single local eQTL³⁷. Because of the high diversity in the DO and the high rates of linkage disequilibrium, it is possible that there are more local variants regulating transcription creating a proportionally larger effect of local regulation.

**Fig. 2: Transcript heritability and trait relevance.**

To assess the importance of genetic regulation of transcript levels to clinical traits, we compared the local and distal heritabilities of transcripts to their trait relevance. We defined trait relevance for a transcript as its maximum absolute Spearman correlation coefficient (ρ) across all traits (Methods). The local heritability of transcripts was negatively associated with their trait relevance (Fig. 2B), suggesting that the more local genotype influenced transcript abundance, the less effect this variation had on the measured traits. Conversely, the distal heritability of transcripts was positively associated with trait relevance (Fig. 2C). That is, transcripts that were more highly correlated with the measured traits tended to be distally, rather than locally, heritable. This pattern was consistent across all tissues. This finding is also consistent with previous observations that transcripts with low local heritability explain more expression-mediated disease heritability than transcripts with high local heritability¹⁹. However, the positive relationship between trait correlation and distal heritability demonstrated further that there are diffuse genetic effects throughout the genome converging on trait-related transcripts.

High-Dimensional Mediation Analysis identified a high-heritability composite trait that was mediated by a composite transcript

The above univariate analyses establish the importance of distal heritability for trait-relevant transcripts. However, the number of transcripts dramatically exceeds the number of phenotypes. Thus, we expect the heritable, trait-relevant transcripts to be highly correlated and organized according to coherent, biological processes representing the mediating endophenotypes driving clinical trait variation. To identify these endophenotypes in a theoretically principled way, we developed a novel dimension-reduction technique, high-dimension mediation analysis (HDMA), that uses the theory of causal graphical models to identify a transcriptomic signature that is simultaneously 1) highly heritable, 2) strongly correlated to the measured phenotypes, and 3) conforms to the causal mediation hypothesis (Fig. 3). In HDMA, we first use a linear mapping called kernelization to dimension-reduce the genome, transcriptome, and phenome to kernel matrices G_K, T_K and P_K respectively, which each have the dimensions n × n where n is the number of individuals (Methods). These kernel matrices describe the relationships among the individual mice in genome space, transcriptome space, and phenome space and ensure that these three omic spaces have the same dimensions, and thus the same weight in the analysis. If not dimension-reduced, the transcriptome would outweigh the phenome in the model. We then projected these n × n-dimensional kernel matrices onto one-dimensional scores–a composite genome score (G_C), a composite transcriptome score (T_C), and a composite phenome score (P_C)–and used the univariate theory of mediation to constrain these projections to satisfy the hypotheses of perfect mediation, namely that upon controlling for the transcriptomic score, the genome score is uncorrelated to the phenome score. A complete mathematical derivation and implementation details for HDMA are available in the Methods.

Using HDMA we identifed the major axis of variation in the transcriptome that was consistent with mediating the effects of the genome on metabolic traits (Fig. 3). Figure 3A shows the partial correlations (ρ) between the pairs of these composite vectors. The partial correlation between G_C and T_C was 0.43, and the partial correlation between T_C and P_C was 0.79. However, when the transcriptome was taken into account, the partial correlation between G_C and P_C was effectively zero (0.027). P_C captured 30% of the overall trait variance, and its estimated heritability was 0.71 ± 0.084, which was higher than any of the measured traits (Fig. 1F). Thus, HDMA identified a maximally heritable metabolic composite trait and a highly heritable component of the transcriptome that are correlated as expected in the perfect mediation model.

As discussed in the Methods, HDMA is related to a generalized form of canonical correlation analysis (CCA). Standard CCA is prone to over-fitting because in any two large matrices it can be trivial to identify highly correlated composite vectors³⁸. To assess whether our implementation of HDMA was similarly prone to over-fitting in a high-dimensional space, we performed permutation testing. We permuted the individual labels on the transcriptome matrix 10,000 times and recalculated the path coefficient, which is the correlation of G_C and T_C multiplied by the correlation of T_C and P_C. This represents the strength of the path from G_C to P_C that is putatively mediated through T_C. The permutations preserved the correlation between the genome and phenome, but broke the correlations between the genome and the transcriptome, as well as between the transcriptome and the phenome. We could thus test whether, given a random transcriptome, HDMA would overfit and identify apparently mediating transcriptomic signatures in random data. The null distribution of the path coefficient is shown in Fig. 3B, and the observed path coefficient from the original data is indicated by a red arrow. The observed path coefficient was well outside the null distribution generated by permutations (empirical p < 10⁻¹⁶). Figure 3C illustrates this observation in more detail. Although we identified high correlations between G_C and T_C, and modest correlations between T_C and P_C in the null data (Fig. 3C), these two values could not be maximized simultaneously in the null data. In contrast, the red dot shows that in the real data both the G_C-T_C correlation and the T_C-P_C correlation could be maximized simultaneously suggesting that the path from genotype to phenotype through the transcriptome is highly non-trivial and is identifiable in this case.

To test whether the presence of local eQTLs affected the result, we generated two additional transcriptomic kernel matrices. We generated a local-effects only kernel using only locally determined gene expression and a distal-effects only kernel using only distally determined gene expression, i.e., the effects of local haplotype were regressed out. The path coefficient identified using the local kernel was not significantly different from the null (Fig. 3B), suggesting that locally determined gene expression does not mediate the effects of the genome on the phenome. In contrast, the path coefficient identified using the distal kernel, was highly significant and indistinguishable from that identified using the full transcriptome.

Further, the G_C-T_C and T_C-P_C correlations derived from the distal kernal were indistinguishable from those derived from the original transcriptomic kernel. In contrast, the G_C-T_C correlation derived with the local kernel was high (Pearson correlation r = 0.92), reflecting the fact that the local transcriptomic kernel was derived directly from local haplotypes. The T_C-P_C correlation, however, was low (Pearson correlation r = 0.14), suggesting that the these locally derived transcripts were not highly related to phenotype. In other words, mice that shared many local eQTL were not highly similar in trait space. Taken together, these results suggest that composite vectors derived from the measured transcriptomic kernel represent genetically determined variation in phenotype that is mediated through genetically determined variation in transcription, and that this genetically deremined variation in transcription is largely driven by distal factors.

Body weight and insulin resistance were highly represented in the expression-mediated composite trait

Each composite score is a weighted combination of the measured variables. The magnitude and sign of the weights, called loadings, correspond to the relative importance and directionality of each variable in the composite score. The loadings of each measured trait onto P_C indicate how much each contributed to the composite phenotype. Body weight (Weight_Final) contributed the most (Fig. 4A), followed by homeostatic insulin resistance (HOMA_IR) and fasting plasma insulin levels (Insulin_Fasting). We can thus interpret P_C as an index of metabolic disease (Fig. 4B). Individuals with high values of P_C had a higher metabolic disease index (MDI) and greater metabolic disease, including higher body weight and higher insulin resistance. We refer to P_C as MDI going forward. Traits contributing the least to MDI were measures of cholesterol and pancreas composition. Thus, when we interpret the transcriptomic signature identified by HDMA, we are explaining primarily the putative transcriptional mediation of body weight and insulin resistance, as opposed to cholesterol measurements.

High-loading transcripts had low local heritability, high distal heritability, and were linked mechanistically to obesity

We interpreted large loadings onto transcripts as indicating strong mediation of the effect of genetics on MDI. Large positive loadings indicate that higher expression was associated with a higher MDI (i.e., higher risk of obesity and metabolic disease on the HFHS diet) (Fig. 4C–E). Conversely, large negative loadings indicate that high expression of these transcripts was associated with a lower MDI (i.e., lower risk of obesity and metabolic disease on the HFHS diet) (Fig. 4C–E). Figure 4D compares the observed transcript loading distributions to null distributions and indicates how many transcripts in each tissue had large positive and negative loadings. A direct comparison of the tissues can be seen in Supplementary Fig. 3. We used gene set enrichment analysis (GSEA)^39,40 to look for biological processes and pathways that were enriched at the top and bottom of this list (Methods).

In adipose tissue, both GO processes and KEGG pathway enrichments pointed to an axis of inflammation and metabolism (Supplementary Figs. 4 and 5). GO terms and KEGG pathways associated with inflammation were positively associated with MDI, indicating that increased expression in inflammatory pathways was associated with a higher burden of disease. It is well established that adipose tissue in obese individuals is inflamed and infiltrated by macrophages^{41,42,43,44,45}, and the results here suggest that this may be a dominant heritable component of metabolic disease.

The strongest negative enrichments in adipose tissue were related to mitochondial activity in general, and thermogenesis in particular (Supplementary Figs. 4 and 5). Genes in the KEGG oxidative phosphorylation pathway were almost universally negatively loaded in adipose tissue, suggesting that increased expression of these genes was associated with reduced MDI (Supplementary Fig. 6). Consistent with this observation, it has been shown previously that mouse strains with greater thermogenic potential are also less susceptible to obesity on an obesigenic diet⁴⁶.

Transcripts associated with the citric acid cycle as well as the catabolism of the branched-chain amino acids (valine, leuceine, and isoleucine) were strongly enriched with negative loadings in adipose tissue (Supplementary Figs. 5, 7 and 8). Expression of genes in both pathways (for which there is some overlap) has been previously associated with insulin sensitivity^12,47,48, suggesting that heritable variation in regulation of these pathways may influence risk of insulin resistance.

Looking at the 10 most positively and negatively loaded transcripts from each tissue, it is apparent that transcripts in the adipose tissue had the largest loadings, both positive and negative (Fig. 5A bar plot). This suggests that much of the effect of genetics on body weight and insulin reisistance is mediated through gene expression in adipose tissue. This finding does not speak to the relative importance of tissues not included in this study, such as brain, in which transcriptional variation may mediate a large portion of the genetic effect on obesity⁴⁹. The strongest loadings in liver and pancreas were comparable, and those in skeletal muscle were the weakest (Fig. 5A), suggesting that less of the genetic effects were mediated through transcription in skeletal muscle. Heritability analysis showed that transcripts with the largest loadings had higher distal heritability (31.8%) than local heritability (5%) (Fig. 5A) (two-sided Welch’s t-test t = 16.4; df = 100.2; difference 95% CI = 0.24 to 0.30; p < 2. 2⁻¹⁶). We also performed TWAS in this population by imputing transcript levels for each gene based on local genotype only and correlating the imputed transcript levels with each trait. In contrast to HDMA, the TWAS procedure (Fig. 5B) tended to nominate transcripts with lower loadings, higher local heritability (15%), and lower distal heritability (20%) (two-sided Welch’s t-test t = 1.9; df = 151.7; difference 95% CI = −0.002 to 0.1; p = 0.77). Finally, we focused on transcripts with the highest local heritability in each tissue (Fig. 5C). This procedure selected transcripts with low loadings on average, consistent with our findings above that high local heritability was associated with low trait correlations (Fig. 2B).

**Fig. 5: Transcripts with high loadings have high distal heritability and literature support.**

We performed a literature search for the genes in each of these groups along with the terms diabetes, obesity, and the name of the expressing tissue to determine whether any of these genes had previous tissue-specific associations with metabolic disease in the literature (Methods). Multiple genes in each group had been previously associated with obesity and diabetes in the represented tissue (Fig. 5 bolded gene names). Genes with high loadings were most highly enriched for previous literature support. They were 2.2 times more likely than TWAS hits and 4 times more likely than genes with high local heritability to be previously associated with obesity or diabetes.

Tissue-specific transcriptional programs were associated with metabolic traits

Clustering of transcripts with top loadings in each tissue showed tissue-specific functional modules associated with obesity and insulin resistance (Fig. 6A) (Methods). The clustering highlights the importance of immune activation, particularly in adipose tissue. The mitosis cluster had large positive loadings in three of the four tissues potentially suggesting system-wide proliferation of immune cells. Otherwise, all clusters were strongly loaded in only one or two tissues. For example, the lipid metabolism cluster was loaded most heavily in liver. The positive loadings suggest that high expression of these genes, particularly in the liver, was associated with increased metabolic disease. This cluster included the gene Pparg, whose primary role is in the adipose tissue where it is considered a master regulator of adipogenesis⁵⁰. Agonists of Pparg, such as thiazolidinediones, are FDA-approved to treat type II diabetes, and reduce inflammation and adipose hyptertrophy⁵⁰. Consistent with this role, the loading for Pparg in adipose tissue was negative, suggesting that higher expression was associated with leaner mice (Fig. 6B). In contrast, Pparg had a large positive loading in liver (Fig. 6B), where it is known to play a role in the development of hepatic steatosis, or fatty liver. Mice that lack Pparg specifically in the liver, are protected from developing steatosis and show reduced expression of lipogenic genes^51,52. Overexpression of Pparg in the livers of mice with a Ppara knockout, causes upregulation of genes involved in adipogenesis⁵³. In the livers of both mice and humans high Pparg expression is associated with hepatocytes that accumulate large lipid droplets and have gene expression profiles similar to that of adipocytes^54,55. The local and distal heritability of Pparg is low in adipose tissue suggesting its expression in this tissue is highly constrained in the population (Fig. 6B). However, the distal heritability of Pparg in liver is relatively high suggesting it is complexly regulated and has sufficient variation in this population to drive variation in phenotype. Both local and distal heribatility of Pparg in the islet are relatively high, but the loading is low, suggesting that variability of expression in the islet does not drive variation in MDI. These results highlight the importance of tissue context when investigating the role of heritable transcript variability in driving phenotype. Gene lists for all clusters with tissue-specific loadings are available in Supplementary Data 1.

**Fig. 6: Tissue-specific transcriptional programs are associated with obesity and insulin resistance.**

Gene expression, but not local eQTLs, predicted body weight in an independent population

To test whether the transcript loadings identified in the DO could be translated to another population, we tested whether they could predict metabolic phenotypes in an independent population of CC-RIX mice, which were F1 mice derived from multiple pairings of Collaborative Cross (CC)^32,56,57,58 strains (Fig. 7) (Methods). We asked whether the loadings identified in the DO mice were relevant to the relationship between the transcriptome and the phenome in the CC-RIX. We predicted body weight (a surrogate for MDI) in each CC-RIX individual using measured gene expression in each tissue and the transcript loadings identified in the DO (Methods). The predicted body weight and acutal body weight were highly correlated (Fig. 7B). The best prediction was achieved for adipose tissue, which supports the observation in the DO that adipose expression was the strongest mediator of the genetic effect on MDI. This result also confirms the validity and translatability of the transcript loadings and their relationship to metabolic disease.

**Fig. 7: Transcription, but not local genotype, predicts phenotype in the CC-RIX.**

We then investigated the source of the relevant variation in gene expression. If local regulation was the predominant factor influencing trait-relevant gene expression, we should be able to predict phenotype in the CC-RIX using transcripts imputed from local genotype (Fig. 7A). The DO and the CC-RIX were derived from the same eight founder strains and so carry the same alleles throughout the genome. We imputed gene expression in the CC-RIX using local genotype and were able to estimate variation in gene transcription robustly (Supplementary Fig. 9). However, these imputed values failed to predict body weight in the CC-RIX when weighted with the loadings from HDMA. (Fig. 7C). This result suggests that local regulation of gene expression is not the primary factor driving heritability of complex traits. It is also consistent with our findings in the DO population that distal heritability was a major driver of trait-relevant gene expression and that high-loading transcripts had comparatively high distal, and low local, heritability.

Distally heritable transcriptomic signatures suggested variation in composition of adipose tissue and islets

The interpretation of global genetic influences on gene expression and phenotype is potentially more challenging than the interpretation and translation of local genetic influences, as genetic effects cannot be localized to individual gene variants or transcripts. However, there are global patterns across the loadings that can inform mechanism. For example, heritable variation in cell type composition can be inferred from transcript loadings. We observed above that immune activation in the adipose tissue was a highly enriched process correlating with obesity in the DO population. In humans, it has been extensively observed that macrophage infiltration in adipose tissue is a marker of obesity and metabolic disease⁵⁹. To determine whether the immune activation reflected a heritable change in cell composition in adipose tissue in DO mice, we compared loadings of cell-type specific genes in adipose tissue (Methods). The mean loading of macrophage-specific genes was significantly greater than 0 (Holm-adjusted two-sided empirical p < 2 × 10⁻¹⁶) (Fig. 8A), indicating that obese mice were genetically predisposed to have high levels of macrophage infiltration in adipose tissue in response to the HFHS diet. Loadings for marker genes for other cell types were not statistically different from zero (Adipocytes: p = 0.08, Progenitors: p = 0.58, Leukocytes: p = 0.28; all Holm-adjusted two-sided empirical p), indicating that changes in the abundance of those cell types was not a mediator of MDI.

**Fig. 8: HDMA results translate to humans.**

We also compared loadings of cell-type specific transcripts in islet (Methods). The mean loadings for alpha-cell specific transcripts were significantly greater than 0 (Holm-adjusted two-sided empirical p = 0.002), while the mean loadings for transcripts specific to delta cells (Holm-adjusted two-sided empirial p < 2 × 10⁻¹⁶) and endothelial cells (Holm-adjusted two-sided empirical p = 0.01) were significantly less than 0 (Fig. 8B). These results suggest that mice with higher MDI inherited an altered cell composition that predisposed them to metabolic disease, or that these compositional changes were induced by the HFHS diet in a heritable way. In either case, these results support the hypothesis that alterations in islet composition drive variation in MDI. Notably, the mean loading for pancreatic beta cell marker transcripts was not significantly different from zero (Holm-adjusted two-sided empirical p = 0.95). We stress that this is not necessarily reflective of the function of the beta cells in the obese mice, but rather suggests that any variation in the number of beta cells in these mice was unrelated to obesity and insulin resistance, the major contributors to MDI. This is further consistent with the islet composition traits having small loadings in the phenome score (Fig. 4).

Heritable transcriptomic signatures translated to human disease

Ultimately, the heritable transcriptomic signatures that we identified in DO mice will be useful if they inform mechanism and treatment of human disease. To investigate the potential for translation of the gene signatures identified in DO mice, we compared them to transcriptional profiles in obese and non-obese human subjects (Methods). We limited our analysis to adipose tissue because the adipose tissue signature had the strongest relationship to obesity and insulin resistance in the DO.

We calculated a predicted MDI for each individual in the human studies based on their adipose tissue gene expression (Methods) and compared the predicted scores for obese and non-obese groups as well as diabetic and non-diabetic groups. In all cases, the predicted MDIs were higher on average for individuals in the obese and diabetic groups compared with the lean and non-diabetic groups (Fig. 8D). This indicates that the distally heritable signature of MDI identified in DO mice is relevant to obesity and diabetes in human subjects.

Existing therapies are predicted to target mediator gene signatures

Another application of the transcript loading landscape is in ranking potential drug candidates for the treatment of metabolic disease. Although high-loading transcripts may be good candidates for understanding specific biology related to obesity, the transcriptome overall is highly interconnected and redundant. The ConnectivityMap (CMAP) database^60,61 developed by the Broad Institute allows querying thousands of compounds that reverse or enhance the extreme ends of transcriptomic signatures in multiple different cell types. By identifying drugs that reverse pathogenic transcriptomic signatures, we can potentially identify compounds that have favorable effects on gene expression. To test this hypothesis, we queried the CMAP database through the CLUE online query tool (https://clue.io/query/, version 1.1.1.43) (Methods). We identified top anti-correlated hits across all cell types (Supplementary Figs. 10 and 11). To get more tissue-specific results, we also looked at top results in cell types that most closely resembled our tissues. We looked at results in adipocytes (ASC) as well as pancreatic tumor cells (YAPC) regardless of p value (Supplementary Figs. 12 and 13).

The CMAP database identified both known diabetes drugs (e.g., sulfonylureas), as well as drugs that target pathways known to be involved in diabetes pathogenesis (e.g., mTOR inhibitors). These findings help support the mediation model we fit here. Although the composite variables we identified here are consistent with mediation, they do not prove causality. However, the results from CMAP suggest that reversing the transcriptomic signatures we found also reverses metabolic disease phenotypes, which supports a causal role of the transcript levels in driving pathogenesis of metabolic disease. These results thus support the mediation model we identified here and its translation to therapies in human disease.

Discussion

Here we investigated the relative contributions of local and distal gene regulation in four tissues to heritable variation in traits related to metabolic disease in genetically diverse mice. We found that distal heritability was positively correlated with trait relatedness, whereas high local heritability was negatively correlated with trait relatedness. We used a novel high-dimensional mediation analysis (HDMA) to identify tissue-specific composite transcripts that are predicted to mediate the effect of genetic background on metabolic traits. The adipose-derived composite transcript robustly predicted body weight in an independent cohort of diverse mice with a disparate population structure. It also predicted MDI in four human cohorts. However, gene expression imputed from local haplotype failed to predict body weight in the second mouse population. Taken together, these results highlight the complexity of gene expression regulation in relation to trait heritability and suggest that heritable trait variation is mediated primarily through distal gene regulation.

Our result that distal regulation accounted for most trait-related gene expression differences is consistent with a complex model of genetic trait determination. It has frequently been assumed that gene regulation in cis is the primary driver of genetically associated trait variation, but attempts to use local gene regulation to explain phenotypic variation have had limited success^16,17. In recent years, evidence has mounted that distal gene regulation may be an important mediator of trait heritability^18,19,62,63. It has been observed that transcripts with high local heritability explain less expression-mediated disease heritability than those with low local heritability¹⁹. Consistent with this observation, genes located near GWAS hits tend to be complexly regulated¹⁸. They also tend to be enriched with functional annotations, in contrast to genes with simple local regulation, which tend to be depleted of functional annotations suggesting they are less likely to be directly involved in disease traits¹⁸. These observations are consistent with principles of robustness in complex systems in which simple regulation of important elements leads to fragility of the system^64,65,66. Our results are consistent, instead, with a more complex picture where genes whose expression can drive trait variation are buffered from local genetic variation but are extensively influenced indirectly by genetic variation in the regulatory networks converging on those genes.

Our results are also consistent with the recently proposed omnigenic model, which posits that complex traits are massively polygenic and that their heritability is spread out across the genome⁶⁷. In the omnigenic model, genes are classified either as core genes, which directly impinge on the trait, or peripheral genes, which are not directly trait-related, but influence core genes through the complex gene regulatory network. HDMA explicitly models a central proposal of the omnigenic model which posits that once the expression of the core genes (i.e., trait-mediating genes) is accounted for, there should be no residual correlation between the genome and the phenome. Here, we were able to fit this model and identified a composite transcript that, when taken into account, left no residual correlation between the composite genome and composite phenome scores (Figs. 3A, and 4E).

Unlike in the omnigenic model, we did not observe a clear demarcation between the core and peripheral genes in loading magnitude, but we do not necessarily expect a clear separation given the complexity of gene regulation and the genotype-phenotype map⁶⁸.

An extension of the omnigenic model proposed that most heritability of complex traits is driven by weak distal eQTLs that are potentially below the detection threshold in studies with feasible sample sizes⁶². This is consistent with what we observed here. For example, the gene Nucb2, had a high loading in islets and was also strongly distally regulated (65% distal heritability) (Fig. 5). This gene is expressed in pancreatic β cells and is involved in insulin and glucagon release^69,70,71. Although its transcription was highly heritable in islets, that regulation was distributed across the genome, with no clear distal eQTL (Supplementary Fig. 14). Thus, although distal regulation of some genes may be strong, this regulation is likely to be highly complex and not easily localized.

Individual high-loading transcripts also demonstrated biologically interpretable, tissue-specific patterns. We highlighted Pparg, which is known to be protective in adipose tissue⁵⁰ where it was negatively loaded, and harmful in the liver^{51,52,53,54,55}, where it was positively loaded. Such granular patterns may be useful in generating hypotheses for further testing, and prioritizing genes as therapeutic targets. The tissue-specific nature of the loadings also may provide clues to tissue-specific effects, or side effects, of targeting particular genes system-wide.

In addition to identifying individual transcripts of interest, the composite transcripts can be used as weighted vectors in multiple types of analysis, such as drug prioritization using gene set enrichment analysis (GSEA) and the CMAP database. In particular, the CMAP analysis identified drugs that have been demonstrated to reverse insulin resistance and other aspects of metabolic disease. This finding supports the hypothesis that HDMA identified transcripts that truly mediate genetic effects on traits. HDMA identifies transcriptional patterns that are consistent with a mediation model, but alone does not prove mediation. However, the finding that these drugs act both on the transcriptional patterns and on the desired traits support the mediation model and the hypothesis that these transcripts play a causal role in pathogenesis of metabolic disease.

Together, our results have shown that both tissue specificity and distal gene regulation are critically important to understanding the genetic architecture of complex traits. We identified important genes and gene signatures that were heritable, plausibly causal of disease, and translatable to other mouse populations and to humans. Finally, we have shown that by directly acknowledging the complexity of both gene regulation and the genotype-to-phenotype map, we can gain a new perspective on disease pathogenesis and develop actionable hypotheses about pathogenic mechanisms and potential treatments.

Methods

Statistics and reproducibility

In this study we used two populations of laboratory mice (Mus musculus): 1) a population of 482 diversity outbred mice (J:DO, JAX strain#:009376) (n = 240 females and n = 242 males), and 2) a population of 466 CC-RIX mice (n = 234 females, and n = 232 males), which were F1 mice derived from crosses of the following JAX strains of Collaborative Cross mice: CC043/GeniUncJ (strain#:023828), CC011/UncJ (strain#018854), CC030/GeniUncJ (strain#:025426), CC002/UncJ (strain#:021236), CC051/TauUncJ (strain#:021897), CC019/TauUncJ (strain#:CC019), CC027/GeniUncJ (strain#:025130), CC012/GeniUncJ (strain#:028409), CC024/GeniUncJ (strain#:021891), CC075/UncJ (strain#:027293), CC005/TauUncJ (strain#:020945), CC059/TauUncJ (strain#:025125), CC001/UncJ (strain#:021238), CC042/GeniUncJ (strain#:020947), CC040/TauUncJ (strain#:023831), CC004/TauUncJ (strain#:020944), CC009/UncJ (strain#:018856), CC013/GeniUncJ (strain#:021892), and CC060/UncJ (strain#:026427). Counts of male and female mice in each strain are available in the Supplementary Data 2. All DO mice were maintained on a high-fat, high-sugar diet. The CC-RIX mice were randomized to a high-fat or a low-fat diet. The diets were different colors, so blinding was not possible. CC-RIX mice were sacrificed in a 6-month cohort and a 12-month cohort. In the DO population, we used sex, DO generation, and DO wave as covariates in all statistical tests. In the CC-RIX population, we used diet, sex, and age as covariates in all statistical tests. We did not investiagate sex-specific effects on genetic architecture in order to maximize power and generalizability, as well as to simplify the overall analysis. CC-RIX animals were randomly assigned to housing and diets based on litters with multiple litters in each group. While undergoing metabolic phenotyping animals were randomly assigned an order for testing. Sample sizes were determined using statistical rules of thumb⁷². We included at least four biological replicates that were tested over multiple months to increase replication power. No data were excluded. Further experimental details and statistical testing are described individually in the following Methods sections.

Diversity outbred mice

Mice were maintained and treated in accordance with the guidelines approved by the Department of Biochemistry animal vivarium at the University of Wisconsin. Animal husbandry and in vivo phenotyping methods were previously published are are described briefly below^12,28.

A population of 482 diversity outbred mice (split evenly between male and female) from generations 18, 19, and 21, was placed on a high-fat (44.6% kcal fat), high-sugar (34% carbohydrate), adequate protein (17.3% protein) diet from Envigo Teklad (catalog number TD.08811) starting at four weeks of age as described previously¹². Individuals were assessed longitudinally for multiple metabolic measures including fasting glucose levels, glucose tolerance, insulin levels, body weight, and blood lipid levels.

When mice were harvested at 22 weeks of age, their pancreatic islets were isolated by hand. Insulin per islet was measured, and whole pancreas insulin content was calculated from the insulin per islet measure and the total numer of islets per pancreas¹². RNA was isolated from the whole islets and sent to The Jackson Laboratory for high-throughput sequencing¹².

Trait measurements

Trait measurements were described previously in Keller et al. 2018¹². Briefly, body weight was measured every two weeks, and 4-h fasting plasma samples were collected to measure insulin, glucose, and triglycerides (TG). At around 18 weeks of age, an oral glucose tolerance test (oGTT) was conducted on 4-hour fasted mice to assess changes in plasma insulin and glucose. Glucose (2 g/kg) was given via oral gavage. Blood samples were taken from a retro-orbital bleed before glucose administration, and at 5, 15, 30, 60, and 120 minutes afterward. The area under the curve (AUC) was calculated for glucose and insulin. Glucose was measured using the glucose oxidase method, and insulin was measured by radioimmunoassay.

HOMA-IR and HOMA-B, which are homeostatic model assessments of insulin resistance (IR) and pancreatic islet function (B), were calculated using fasting plasma glucose and insulin values at the start of the oGTT. HOMA-IR = (glucose × insulin) / 405 and HOMA-B = (360 × insulin) / (glucose - 63). Plasma glucose and insulin units are mg/dL and mU/L, respectively.

Genotyping

Genotypes at 143,259 markers was performed using the Mouse Universal Genotyping Array (GigaMUGA)⁷³ at Neogen (Lincoln, NE) as described previously^12,74. Genotypes were converted to founder strain-haplotype reconstructions using the R/DOQTL software⁷⁵ and interpolated onto a grid with 0.02-cM spacing to yield 69,005 pseudomarkers. Individual chromosome (Chr) haplotypes were reconstructed from RNA-seq data using a hidden Markov model⁷⁶ (GBRS, https://github.com/churchill-lab/gbrs). Using both methods to call haplotypes provided redundancy for quality control. Three mice had inconsistent calls between the two methods and were excluded from the analysis¹².

Trait selection in DO

We filtered the measured traits in this study to a set of relatively non-redundant measures that were well-represented in the population (having at least 80% of individuals measured). A complete description of trait filtering can be found at Figshare https://doi.org/10.6084/m9.figshare.27066979⁷⁷ in the file Documents > 1.DO > 1b.Trait_Selection.Rmd.

We took two approaches for traits with multiple redundant measurements, for example longitudinal body weights. In the case of longitudinal measurements, we used the final measurement, as this was the closest physiological measurement to the measurement of gene expression, which was done at the end of the experiment. The labels for these traits have the word Final appended to their name. For traits with multiple highly related measurements, such as cholesterol, we used the first principal component of the group of measurements. For example, we used the first principal component of all LDL measurements as the measurement of LDL. For each set of traits, we ensured the first principal component had the correct sign by correlating it with the average of the traits. For pearson correlation coefficients (r) less than 0, we multiplied the principal component by −1. The labels for these traits have the term PC1 appended to their name.

Heritability of each trait was estimates using the R package qtl2³⁴. Standard errors of heritability were estimated using a purpose-built function by Karl Broman available on GitHub [https://github.com/rqtl/qtl2/issues/193]. This method is based on Equation 2 in Visscher and Goddard (2015)⁷⁸.

Processed DO data

The DO data used in this study were generated in a previous study^12,28. We downloaded genotypes, phenotypes, and pancreatic islet gene expression data from Dryad https://doi.org/10.5061/dryad.pj105⁷⁹.

Collaborative cross recombinant inbred mice (CC-RIX)

CC-RIX mice were derived by crossing strains of Collaborative Cross (CC) mice to produce heterozygous F1 mice. Each strain included between 19 and 24 biological replicates. The strains used and the number of mice in each are reported in the supplemental file CC-RIX_strain_table.txt.

Mice were cared for and treated following the guidelines approved by the Association for Assessment and Accreditation of Laboratory Animal Care at The Jackson Laboratory. All animals were obtained from The Jackson Laboratory. The mice were kept in a pathogen-free room at a temperature ranging from 20 to 22 °C with a 12-h light/dark cycle. Starting at 6 weeks of age, they were fed either a custom-designed high-fat, high-sugar (HFHS) diet (Research Diets D19070208) or a control diet (Research Diets D19072203) ad libitum. Body weight was measured weekly until the mice were about 16 weeks old, after which measurements were taken every other week. Food intake measurements were collected at 14 weeks, 23 weeks (for 6-month cohorts), 26 weeks (for 12-month cohorts), 38 weeks, and 51 weeks by weighing the grain contents in the cage over a three-day period. Fasted serum was collected at 14 weeks, 28 weeks (for 6-month cohorts), 26 weeks (for 12-month cohorts), 38 weeks, and 56 weeks of age via retro-orbital or submental vein. Sex, diet, and age were used as covariates in all analyses.

CC-RIX genotypes

We used the most recent common ancestor (MRCA) genotypes for the Collaborative Cross (CC) mice available on the University of North Carolina Computational Systems Biology website: http://www.csbio.unc.edu/CCstatus/CCGenomes/.

To generate CC-RIX genotypes, we averaged the haplotype probabilities for the two parental strains at each locus.

Clinical chemistries

CC-RIX animals were fasted for four hours before serum collection via the retro-orbital or submental vein. Whole blood was left at room temperature for 30–60 min before being centrifuged for 5 minutes at 14,674 g. The serum was then tested for glucose (Beckman Coulter; OSR6121), cholesterol (Beckman Coulter; OSR6116), triglycerides (Beckman Coulter; OSR60118), insulin (MSD; K152BZC-1), or c-peptide (MSD; K1526JK-1).

Intraperitoneal glucose tolerance testing

After a fasting period of 4–6 h, baseline glucose measurements were taken from CC-RIX mice using an AlphaTrak2 glucometer and test strips (Zoetis) by making a small nick in the tail tip. A bolus intraperitoneal injection of 20% glucose (1g/kg) was then administered, and additional tail tip nicks were performed at 15, 30, 60, and 120 minutes post-injection to measure glucose levels.

Bulk tissue collection

At either 28 weeks of age (for the 6-month cohort) or 56 weeks of age (for the 12-month cohort), CC-RIX animals were humanely euthanized by cervical dislocation. Tissues, including visceral adipose (gonadal fat pad), skeletal muscle (gastrocnemius), and the left liver lobe, were harvested and flash-frozen in liquid nitrogen for RNA sequencing.

Whole pancreas insulin content

The animals were humanely euthanized at 16 weeks of age and the entire pancreas was removed, ensuring no excess fat or mesentery tissue was included. The pancreas tissue was placed in a pre-weighed 20 mL glass scintillation vial containing acid ethanol (75% HPLC grade ethanol (ThermoFisher; A995-4), 1.5% concentrated hydrochloric acid (ThermoFisher; A144-212) in distilled water). The weight of the pancreas was measured for normalization. Using curved scissors, the pancreas was chopped for four minutes, and the samples were stored at −20 °C until all animals were harvested. For insulin measurements, the contents of the scintillation vials were rinsed with 4 mL PBS (Roche; 1666789) with 1% BSA (Sigma; A7888), neutralized with 65 μL 10N NaOH (Fisher; SS255-1), and vortexed for 30 seconds. The samples were then centrifuged at 4 °C for 5 minutes at 452 g. The samples were diluted 5000X in PBS with 1% BSA, and insulin was measured (MSD; K152BZC-1).

RNA isolation and QC

RNA from both DO and CC-RIX adipose, gastrocnemius, and left liver lobe tissues was isolated using the MagMAX mirVana Total RNA Isolation Kit (ThermoFisher; A27828) and the KingFisher Flex purification system (ThermoFisher; 5400610). The frozen tissues were pulverized with a Bessman Tissue Pulverizer (Spectrum Chemical) and homogenized in TRIzol^™ Reagent (ThermoFisher; 15596026) using a gentleMACS dissociator (Miltenyi Biotec Inc). After adding chloroform to the TRIzol homogenate, the RNA-containing aqueous layer was extracted for RNA isolation, following the manufacturer’s protocol, starting with the RNA bead binding step using the RNeasy Mini kit (Qiagen; 74104). RNA concentrations and quality were assessed using the Nanodrop 8000 spectrophotometer (Thermo Scientific) and the RNA 6000 Pico or RNA ScreenTape assay (Agilent Technologies).

Library construction

Before library construction, 2 μL of diluted (1:1000) ERCC Spike-in Control Mix 1 (ThermoFisher; 4456740) was added to 100 ng of each RNA sample. Libraries were then constructed using the KAPA mRNA HyperPrep Kit (Roche Sequencing Store; KK8580) following the manufacturer’s protocol. The process involves isolating polyA-containing mRNA using oligo-dT magnetic beads, fragmenting the RNA, synthesizing the first and second strands of cDNA, ligating Illumina-specific adapters with unique barcode sequences for each library, and performing PCR amplification. The quality and concentration of the libraries were evaluated using the D5000 ScreenTape (Agilent Technologies) and the Qubit dsDNA HS Assay (ThermoFisher; Q32851), respectively, according to the manufacturers’ instructions.

Sequencing

Libraries were sequenced on an Illumina NovaSeq 6000 using the S4 Reagent Kit (Illumina; 20028312). All tissues underwent 100 bp paired-end sequencing, aiming for a target read depth of 30 million read pairs.

Processing of RNA sequencing data

We used the Expectation-Maximization algorithm for Allele Specific Expression (EMASE)^80,81 to quantify multi-parent allele-specific and total expression from RNA-seq data for each tissue. EMASE was performed by the Genotype by RNA-seq (GBRS) software package (https://gbrs.readthedocs.io/en/latest/). In the process, R1 and R2 FASTQ files were combined and aligned to a hybridized (8-way) transcriptome generated for the 8 DO founder strains as single-ended reads. GBRS was also used to reconstruct the mouse genotype probabilities along ~ 69K markers, which was used for confirming genotypes in the quality control (QC) process. For the QC process, we used a Euclidean distances method (developed by Greg Keele - Churchill Lab) to compare the GBRS genotype probabilities between the tissues and the genotype probabilities array for all mice. The counts matrix for each tissue was processed to filter out transcripts with less than one read for at least half of the samples. RNA-seq batch effects were removed by regressing out batch as a random effect and considering sex and generation as fixed effects using lme4 R package⁸². RNA-Seq counts were normalized relative to total read counts using the variance stabilizing transform (VST) as implemented in DESeq2⁸³ and using rank normal score.

eQTL analysis

We used the R package qtl2³⁴ to perform eQTL analysis. We used the rank normal score data and used sex, DO generation, and DO wave as additive covariates. We also used kinship as a random effect. We used permutations to find a LOD threshold of eight for significant QTLs which corresponded to a genome-wide p value of 0.01⁸⁴.

To assess whether eQTL were shared across tissues, we considered significant eQTLs within 4Mb of each other to be overlapping. We considered local and distal eQTLs separately. Local eQTL were defined as an eQTL within 4Mb of the transcription start site (TSS) of the encoding gene. Distal eQTL were defined as an eQTL greater than 4Mb from the TSS of the encoding gene.

Local and distal heritability of transcripts

To estimate local and distal heritability of each transcript, we scaled each normalized transcript to have a variance of 1. We then modeled this transcript with the local genotype using the fit1() function in the R package qtl2³⁴. We used the resulting model to predict the transcript values. The variance of the predicted transcript is its local heritability. We then estimated the heritability of the residual of the model fit. The variance of the residual multiplied by its heritability is the estimate of the distal heritability of the transcript.

We compared local and distal estimates of heritability to measures of trait relevance for each transcript. To calculate trait relevance of a given transcript, we adjusted normalized transcript values for sex, DO wave, and DO generation. We similarly adjusted traits by sex, DO wave, and DO generation. We then calculated all Spearman correlation coefficients (ρ) between adjusted traits and adjusted transcripts. The trait relevance of a given tanscript was the maximum absolute correlation coefficient across all traits.

High-dimensional mediation analysis

In this section we derive the objective function for high-dimensional mediation analysis (HDMA) and present an iterative algorithm to optimize this objective function. Our starting point is the univariate case, where we describe perfect mediation as a constraint on the covariance matrix among variables. We then leverage this constraint to define projections of multivariate data that are maximally consistent with perfect mediation (HDMA). Next, we demonstrate how to kernelize HDMA to limit dimensionality of the model and enable non-linear HDMA models.

Perfect mediation as a constraint on covariance matrices

Suppose we have three random variables x, m, and y. Assume they each have unit variance and that they satisfy the following structural equation model (SEM) such that m perfectly mediates the effect of x on y:

$$m=\alpha x+{\epsilon }_{m}$$

(1)

$$y=\beta m+{\epsilon }_{y}$$

(2)

From these structural equations, we have the model-implied covariance matrix, Σ, given by

$$\Sigma=\left[\begin{array}{ccc}1&\alpha &\alpha \beta \\ \alpha &1&\beta \\ \alpha \beta &\beta &1\end{array}\right]$$

(3)

Note that the assumption of perfect mediation forces the covariance between x and y to be αβ. In any finite data set, however, the observed covariance matrix, S = [S_ij], will not typically satisfy this constraint.

The general log-likelihood fitting function for an SEM is given by

$$L={{\rm{tr}}}\left(S{\Sigma }^{-1}\right)+\log \left\vert \Sigma \right\vert,$$

(4)

where ∣ ⋅ ∣ denotes the determinant of a matrix and ${{\rm{tr}}}(\cdot )$ denotes the trace⁸⁵. For the perfect-mediation model, these values are

$$\left\vert \Sigma \right\vert=(1-{\alpha }^{2})(1-{\beta }^{2})$$

(5)

$${\Sigma }^{-1}=\left[\begin{array}{ccc}1/(1-{\alpha }^{2})&\alpha /(1-{\alpha }^{2})&0\\ \alpha /(1-{\alpha }^{2})&(1-{\alpha }^{2}{\beta }^{2})/\left((1-{\alpha }^{2})(1-{\beta }^{2})\right)&\beta /(1-{\beta }^{2})\\ 0&\beta /(1-{\beta }^{2})&1/(1-{\beta }^{2})\end{array}\right]$$

(6)

Plugging these into the likelihood function, we get

$$\begin{array}{rc}L=&\log \left((1-{\alpha }^{2})(1-{\beta }^{2})\right)+\frac{3-{\alpha }^{2}-{\beta }^{2}-{\alpha }^{2}{\beta }^{2}}{(1-{\alpha }^{2})(1-{\beta }^{2})}+\frac{2\alpha }{1-{\alpha }^{2}}{S}_{12}+\frac{2\beta }{1-{\beta }^{2}}{S}_{23}\end{array}$$

(7)

To simplify notation, we define

$$F(\alpha,\beta )=\log \left((1-{\alpha }^{2})(1-{\beta }^{2})\right)+\frac{3-{\alpha }^{2}-{\beta }^{2}-{\alpha }^{2}{\beta }^{2}}{(1-{\alpha }^{2})(1-{\beta }^{2})},$$

(8)

so the likelihood function is now

$$L=F(\alpha,\beta )+\frac{2\alpha }{1-{\alpha }^{2}}{S}_{12}+\frac{2\beta }{1-{\beta }^{2}}{S}_{23}$$

(9)

Note that this likelihood is maximized by fitting regression coefficients α and β between x and m and m and y, respectively, but the log-likelihood formulation is useful for the multivariate extension below.

Projecting multivariate data to identify latent mediators

Suppose now that we have three data matrices, X, M, and Y (individuals by variables) that are mean centered by column. The central assumption of HDMA is that these multivariate data encode latent variables that are causally linked according to the perfect-mediation model, in a sense made precise as follows.

We use the log-likelihood function (Eqn. (7)) of the perfect mediation model as an objective function to identify latent variables, l_X, l_M, and l_Y, that are are correlated as closely as possible to the constraints of the perfect mediation model, Eqn. (3). We estimate these latent variables as linear combinations of the measured variables

$${l}_{X}=Xa$$

(10)

$${l}_{M}=Mb$$

(11)

$${l}_{Y}=Yc$$

(12)

The coefficient vectors a, b, and c, are called loadings, analogous to the terminology in PCA and CCA. Because the data matrices are mean centered, we have

$${{\rm{mean}}}({l}_{X})={{\rm{mean}}}({l}_{M})={{\rm{mean}}}({l}_{Y})=0,$$

(13)

and we assume the loadings are scaled so that each latent variable has unit variance

$${{\rm{var}}}({l}_{X})={{\rm{var}}}({l}_{M})={{\rm{var}}}({l}_{Y})=1.$$

(14)

Plugging these formulae into the objective function (Eqn. (9)), we have

$${S}_{12}={{\rm{corr}}}\left({l}_{X},{l}_{M}\right)$$

(15)

$${S}_{23}={{\rm{corr}}}\left({l}_{M},{l}_{Y}\right)$$

(16)

$$L(\alpha,\beta,a,b,c)=F(\alpha,\beta )+\frac{2\alpha }{1-{\alpha }^{2}}{{\rm{corr}}}\left({l}_{X},{l}_{M}\right)+\frac{2\beta }{1-{\beta }^{2}}{{\rm{corr}}}\left({l}_{M},{l}_{Y}\right)$$

(17)

$$=F(\alpha,\beta )+\frac{2\alpha }{1-{\alpha }^{2}}{{\rm{corr}}}\left(Xa,Mb\right)+\frac{2\beta }{1-{\beta }^{2}}{{\rm{corr}}}\left(Mb,Yc\right)$$

(18)

This yields an objective function of two sets of parameters: the structural parameters α and β that define the causal model among latent variables, and the loading vectors a, b, and c, that define the latent variables in terms of the measured variables. The goal of HDMA is to optimize L as a function of all parameters simultaneously. The form of the objective function, Eqn. (17), is effectively a weighted sum of correlation coefficients, connecting it to so-called sum-of-correlation, or SUMCOR, optimization problems⁸⁶, which we discuss further below.

An algorithm for HDMA

The global optimization of (17) is challenging because it is not a convex problem. However, the decomposition of the variables into structural and loading variables suggests an iterative algorithm, similar to the expectation-maximization algorithm, that converges at least to a stationary point. The overall idea is to use a block-coordinate-ascent strategy that iterates between optimizing a, b, and c, then optimizing α and β.

For fixed a, b, and c, the optimal α and β are simply given by regression coefficients between l_X and l_M and l_M and l_Y, respectively. Given these regression coefficients, α and β, we then optimize a, b, and c. For fixed α and β, the term F(α, β) is irrelevant, so maximizing the log-likelihood function reduces to maximizing the reduced function

$${L}_{red}(a,b,c)=\frac{2\alpha }{1-{\alpha }^{2}}{{\rm{corr}}}\left(Xa,Mb\right)+\frac{2\beta }{1-{\beta }^{2}}{{\rm{corr}}}\left(Mb,Yc\right),$$

(19)

which is a weighted sum of correlation coefficients. This is exactly a (weighted) SUMCOR optimization problem⁸⁶. These optimization problems are still not convex, but Tenenhaus et al. have recently proved convergence for iterative algorithms that optimize weighted SUMCOR problems^86,87,88. These algorithms only guarantee convergence to a stationary point not necessarily a maximum, as is common in other non-convex problems, but this can be overcome with multiple random restarts, if needed. Thus, we have a sub-routine wSUMCOR(X, M, Y, w₁, w₂) that solves the weighted SUMCOR problem

$${L}_{wSUMCOR}(a,b,c,{w}_{1},{w}_{2})={w}_{1}{{\rm{corr}}}\left(Xa,Mb\right)+{w}_{2}{{\rm{corr}}}\left(Mb,Yc\right).$$

(20)

Iterating between optimizing the structural parameters and loading parameters, we reduce the negative log-likelihood at each step and converge to a fixed point.

We summarize our optimization procedure in Algorithm 1.

Algorithm 1

High-dimensional mediation analysis

Intput: X, M, Y ▹ Data matrices

Output: α, β, a, b, c, l_X, l_M, l_Y ▹ Structural parameters, loadings, scores

α ← 0.5, β ← 0.5 ▹ Initialize structural parameters

While converge ≠ TRUE do

$d\leftarrow \frac{2\alpha }{1-{\alpha }^{2}}+\frac{2\alpha }{1-{\alpha }^{2}}$ ▹ Normalization constant for weights

${w}_{1}\leftarrow \frac{1}{d}\frac{2\alpha }{1-{\alpha }^{2}}$, ${w}_{2}\leftarrow \frac{1}{d}\frac{2\beta }{1-{\beta }^{2}}$ ▹ Set weights (sum to one)

(a, b, c) ← wSUMCOR(X, M, Y, w₁, w₂) ▹ Compute loadings

l_X ← Xa, l_M ← Mb, l_Y ← Yc ▹ Compute scores α ← corr(l_X, l_M), β ← corr(l_M, l_Y) ▹ Update structural parameters

end while

Kernel HDMA

For large data matrices X, M, and Y, especially with high correlation among variables, as is common for high-throughput biological assays (e.g., ~1M alleles for genotypes, ~20k transcripts), we can further reduce the dimensionality of the HDMA model by requiring that loading vectors lie in the span of the the measured individuals, namely

$$a={X}^{T}\tilde{a}$$

(21)

$$b={M}^{T}\tilde{b}$$

(22)

$$c={Y}^{T}\tilde{c}.$$

(23)

This replaces the full feature data, say X, with the covariances among individuals (aka, Gram matrices), C_X= XX^T, and reduces the dimensionality from the number of measured variables down to the number of individuals

$${l}_{x}=X{X}^{T}\tilde{a}={C}_{X}\tilde{a}$$

(24)

$${l}_{M}=M{M}^{T}\tilde{b}={C}_{M}\tilde{b}$$

(25)

$${l}_{M}=Y{Y}^{T}\tilde{c}={C}_{Y}\tilde{c}.$$

(26)

This reduction is called kernelization⁸⁸ and is widely applied to other linear models, including CCA, linear regression, and classification.

It is interesting to note that kernelization is often used to convert a linear model to a non-linear model by replacing the covariance matrices, e.g., C_X, with more complex kernel matrices K_X that encode similarity measures among individuals that are non-linear functions of the measured variables. non-linear model by replacing the covariance matrices, e.g., C_X, with more complex kernel matrices K_X that encode similarity measures among individuals that are non-linear functions of the measured variables. Promoting a linear model to a non-linear model in this way is called the kernel trick and is widely used in the machine learning field. The above considerations show that HDMA is kernelizable in the same way as other linear models, although the exploration of non-linear models is outside the scope of this study.

We generated kernel matrices for the genome, phenome, and transcriptome as described above. To test the effect of the presence of local eQTLs on mediation, we further generated two additional transcriptomic kernels. 1) A distal-only kernel was derived first by regressing out the effect of local haplotype on all transcripts as described above and generating the kernel matrix using the residual expression (distal-affected only). 2) A local-only kernel was derived by imputing transcription levels for each transcript as described above and then calculating the kernel with only these locally derived expression values. We replaced the original transcriptomic kernel with each of these additional kernels in turn and performed HDMA. We calculated the correlation between all pairs of latent variables and the path coefficient for each instance.

Implementation details

We have implemented HDMA (Algorithm 1) in the R programming language. Tenenhaus et al. have implemented their optimizers in the Regularized Generalized Canonical Correlation Analysis (RGCCA) R package⁸⁹, which we use as the subroutine wSUMCOR. As Tenenhaus et al. discuss optimizing the empirical correlation coefficient per se is numerically unstable due to the inversion of the covariance matrices of the measured variables (e.g., the transcript-transcript covariance matrix). To overcome this, the RGCCA package uses a regularized form of the covariance matrix developed by Schaeffer and Strimmer⁹⁰, which can be estimated rapidly using an analytic formula.

As a convergence criterion, we stop the iterations when both α and β change by less than 10⁻⁶ from their previous value in one iteration.

All code required to run HDMA is available at Figshare: https://figshare.com/https://doi.org/10.6084/m9.figshare.27066979⁷⁹.

Enrichment of biological terms

We performed gene set enrichment analysis (GSEA)⁴⁰ using the transcript loadings in each tissue as gene weights. GSEA determines enrichment of pathways based on where the contained genes appear in a ranked list of genes. If the genes in the pathway are more concentrated near the top (or the bottom) of the list than expected by chance, the pathway can be interpreted as being enriched with positively (negatively) loaded transcripts. We used the R package fgsea³⁹ to calculate normalized enrichment scores for all GO terms and all KEGG pathways.

We downloaded all KEGG⁹¹ pathways for Mus musculus using the R package clusterProfiler⁹¹. We then used fgsea to calculate enrichment scores in each tissue using the transcript loadings in each tissue as our ranked list of genes. We reported the normalized enrichment score (NES) for the 10 pathways with the largest positive NES and the 10 pathways with the largest negative NES.

We used the R package pathview⁹² to visualize the loadings from each tissue in interesting pathways. We scaled the loadings in each tissue by the maximum absolute value of loadings across all tissues to compare them across tissues.

We downloaded GO term annotations from Mouse Genome Informatics at the Jackson Laboratory⁹³ https://www.informatics.jax.org/downloads/reports/index.html. We removed gene-annotation pairs labeled with NOT, indicating that these genes were known not to be involved in these GO terms. We also limited our search to GO terms with between 80 and 3000 genes. We used the R package annotate⁹⁴ to identify the ontology of each term and the R package pRoloc⁹⁵ to convert between GO terms and names. As with the KEGG pathways, we used fgsea to calculate a normalized enrichment score for each GO term and collected loadings for the transcripts in each term to compare across tissues.

TWAS in DO mice

We performed a transcriptome-wide analysis (TWAS)^9,11 in the DO mice to compare to the results of high-dimensional mediation. To perform TWAS, we fit a linear model to explain variation in each transcript across the population using the genotype at the nearest marker to the gene transcription start site (TSS). We used kinship as a random effect and sex, diet, and DO generation as fixed effects. The predicted transcript from each of these models was the imputed transcript based only on the local genotype.

We correlated each imputed transcript with each of the metabolic phenotypes after adjusting phenotypes for sex, diet, DO generation, and DO wave. To calculate significance of these correlations, we performed permutation testing by shuffling labels of individual mice and recalculating correlation values. Significant correlations were those more extreme than any of the permuted values, corresponding to an empirical p-value of 0. These are transcripts whose locally encoded expression level was significantly correlated with one of the metabolic traits. This suggests an association between the genetically encoded transcript level and the trait but does not identify a direction of causation.

Literature support for genes

To determine whether each gene among those with large loadings or large heritability had a supported connection to obesity or diabetes in the literature, we used the R package easyPubMed⁹⁶. We searched for the terms (diabetes OR obesity) along with the tissue name (adipose, islet, liver, or muscle), and the gene name. We restricted the gene name to appear in the title or abstract as some short names appeared coincidentally in contact information. We checked each gene with apparent literature support by hand to verify that support, and we removed spurious associations. For example, FAU is used as an acronym for fatty acid uptake and CAD is used as an acronym for coronary artery disease. Both terms co-occur with the terms diabetes and obesity in a manner independent of the genes Fau and Cad. Other genes that co-occurred with diabetes and obesity, but not as a functional connection were similarly removed. For example, the gene Rpl27 is used as a reference gene for quantification of the expression of other genes, and co-occurrence with diabetes and obesity is a coincidence. We counted the abstracts associated with diabetes or obesity and each gene name and determined that a gene had literature support when it had at least two abstracts linking it to the terms diabetes or obesity in the respective tissue.

Tissue-specific clusters

To compare the top loading genes across tissues, we selected genes with a loading at least 2.5 standard deviations from the mean across all tissues. We made a matrix consisting of the union of these sets populated with the tissue-specific loading for each gene. We used the pam() function in the R package cluster⁹⁷ to cluster the loading profiles around k medoids. We tested k = 2 through 20 and used silhouette analysis to compare the separation of the clusters. The best separation was achieved with k = 12 clusters. For each cluster we used the R package gprofiler2⁹⁸ to identify enriched GO terms and KEGG pathways for the genes in each cluster.

Imputation of gene expression in CC-RIX

To impute gene expression in the CC-RIX, we performed the following steps for each transcript in each tissue (adipose, liver, and skeletal muscle):

1.
Calculate diploid CC-RIX genotype for all CC-RIX individuals at the marker nearest the transcription start site of the transcript.
2.
Multiply the genotype probabilities by the eQTL coefficients identified in the DO population.

To check the accuracy of the imputation, we correlated each imputed transcript with the measured transcript. The average Pearson correlation (r) was close to 0.5 for all three tissues (Supplementary Fig. 9A), and as expected, the correlation between the imputed transcript and the measured transcript was highly positively dependent on the local eQTL LOD score of the transcript (Supplementary Fig. 9B).

Prediction of CC-RIX traits

We used both measured expression and imputed expression combined with the results from HDMA in the to predict metabolic disease index (MDI) in the CC-RIX. The traits measured in the DO and the CC-RIX were not identical, so we limited our prediction to body weight, which was measured in both populations, and was the largest contributor to MDI in the DO.

For each CC-RIX individual, we multiplied the transcript abundances across the transcriptome by the loadings derived from the HDMA in the DO population. This resulted in a vector with n elements, where n is the number of transcripts in the trancriptome. Each element was a weighted value that combined the relative abundance of the transcript with how that abundance affected MDI. We averaged the values in this vector to calculate an overall predicted MDI for the individual CC-RIX animal.

After calculating this predicted MDI across all CC-RIX animals, we correlated the predicted values from each tissue with measured body weight (Supplementary Fig. 9B).

Cell type specificity

We investigated whether the loadings derived from HDMA reflected tissue composition changes in the DO mice prone to obesity on the high-fat diet. To do this, we acquired lists of cell-type specific transcripts from the literature. In adipose tissue, we looked at cell-type specific transcripts for macrophages, leukocytes, adipocyte progenitors, and adipocytes as defined in Ehrlund et al. (2017)⁹⁹. In pancreatic islets, we looked at cell-type specific transcripts for alpha cells, beta cells, delta cells, ductal cells, mast cells, macrophages, acinar cells, stellate cells, gamma and epsilon cells, and endothelial cells as defined by Elgamal et al. (2023)¹⁰⁰. Both studies defined cell-type specific transcripts based on human cell types. We collected the loadings for each set of cell-type specific transcripts in the respective tissue and asked whether the mean loading for the cell type differed significantly from 0. A significant positive loading for the cell type would suggest a genetic predisposition to have a higher proportion of that cell type in the tissue. To determine whether each mean loading differed significantly from 0, we performed permutation tests. We randomly sampled n genes outside of the cell-type specific, where n was the number of genes in the set. We compared the distribution of loading means over 10,000 random draws to that seen in the observed data. We used a significance threshold of 0.01.

Comparison of transcriptomic signatures to human transcriptomic signatures

To compare the transcriptomic signatures identified in the DO mice to those seen in human patients, we downloaded human gene expression data from the Gene Expression Omnibus (GEO)^101,102. We focused on adipose tissue because this had the strongest relationship to obesity and insulin resistance in the DO. We downloaded the following human gene expression data sets:

Accession number GSE152517 - Bulk RNA sequencing on visceral adipose tissue resected from seven diabetic and seven non-diabetic obese individuals.
Accession number GSE44000 - Agilent-014850 4X44K human whole genome platform arrays (GPL6480) measuring gene expression in purified adipocytes derived from the subcutaneous adipose tissue of seven obese (BMI>30) and seven lean (BMI<25) post-menopausal women.
Accession number GSE205668 - Subcutaneous adipose tissue was resected during elective surgery from 35 normal weight, and 26 obese children. Gene expression was measured by RNA sequencing with an Illumina HiSeq 2500.
Accession number GSE29231 - Visceral adipose biopsies were taken from three (four technical replicates each) female patients with type 2 diabetes, and three (four technical replicates each) non-diabetic female patients. Expression was measured with Illumina HumanHT-12 v3 Expression BeadChip arrays.

We downloaded each data set from GEO using the R package GEOquery¹⁰³. In each case, we verified that gene expression was log transformed and performed the transformation ourselves if it had not already been done. When covariates such as age and sex were available in the metadata files, we regressed out these variables (GSE205668—sex and age; GSE44000—none; GSE29231—age; GSE152517—none). We mean-centered and standardized gene expression across transcripts.

We matched the human gene expression to the mouse gene expression by pairing orthologs as defined in The Jackson Laboratory’s mouse genome informatics data base (MGI)¹⁰⁴. We multiplied each transcript in the human data by the adipose tissue loading of its ortholog in the DO mice. This resulted in a vector of weighted transcript values for each patient based on their own transcriptional profile and the obesity-related transcriptional signature from the DO analysis. The mean of this vector for an individual was the prediction of their obesity status (MDI). Higher values indicate a prediction of higher obesity or risk of metabolic disease based on adipose gene expression.

We then compared the values across groups, either obese and non-obese, or diabetic and non-diabetic depending on the groups in each study. For the three studies without technical replicates (accession numbers GSE152517, GSE44000, and GSE205668), we used Welch’s t-test to compare the means of the two groups. This is the default t-test in base R and assumes unequal variance across groups. For the study that included technical replicates (GSE29231) we fit a linear mixed model from the R packages lme4⁸² and lmerTest¹⁰⁵. We used subject ID as a random variable.

Connectivity map queries

We queried the transcript loading signatures from adipose tissue and pancreatic islets with the CMAP database. These tissues are the most related to metabolic disease and diabetes respectively.

The gene expression profiles in the Connectivity Map database are derived from human cell lines and human primary cultures and are indexed by Entrez gene IDs. To query the CMAP database, we identified the Entrez gene IDs for the human orthologs of the mouse genes expressed in each tissue. Each CMAP query takes the 150 most up-regulated and the 150 most down-regulated genes in a signature, however, not all human genes are included in their database. To ensure we had as many genes as possible in the query, we selected the top and bottom 200 genes with the most extreme positive and negative loadings respectively. We pasted these into the CLUE query application available at https://clue.io/query. These gene lists are available as Source Data.

We filtered the results in two ways: First, we looked at the most significantly anti-correlated ($-{\log }_{10}({{\rm{FDR}}}\,q) > 15$) hits across all cell types. Second, we looked at the most anti-correlated within the most related cell type to the query and considered hits regardless of $-{\log }_{10}({{\rm{FDR}}}\,q)$. For adipose tissue we looked in normal adipocytes, abbreviated ASC in the CMAP database, and for pancreatic islets we looked in pancreatic cancer cells, abbreviated YAPC in the CMAP database.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

DO mice: Genotypes, phenotypes, and pancreatic islet gene expression data were previously published¹². Gene expression for the other tissues can be found at the Gene Expression Omnibus with the following accession numbers: DO adipose tissue - GSE266549; DO liver tissue - GSE266569; DO skeletal muscle - GSE266567. Expression data with calculated eQTLs are available at Figshare. 10.6084/m9.figshare.27066979⁷⁷. CC-RIX mice: Gene expression can be found at the Gene Expression Omnibus with the following accession numbers: CC-RIX adipose tissue - GSE237737; CC-RIX liver tissue - GSE237743; CC-RIX skeletal muscle - GSE237747. Count matrices and phenotype data can be found on Figshare. 10.6084/m9.figshare.27066979⁷⁷. Source Data for figures in this manuscript are available on Figshare. 10.6084/m9.figshare.29247428¹⁰⁶. Data from previous publications: We downloaded genotypes, phenotypes, and pancreatic islet gene expression data from Dryad https://doi.org/10.5061/dryad.pj105⁷⁹. To predict human MDI from RNA-seq profiles, the following data were downloaded from the Gene Expression Omnibus: * Accession number GSE152517 Gene expression in visceral adipose tissue resected from seven diabetic and seven non-diabetic obese individuals. * Accession number GSE44000 Gene expression in purified adipocytes derived from the subcutaneous adipose tissue of seven obese (BMI > 30) and seven lean (BMI < 25) post-menopausal women. * Accession number GSE205668 Gene expression in subcutaneous adipose tissue from 35 normal weight, and 26 obese children. * Accession number GSE29231 Visceral adipose biopsies were measured in four technical replicates each in three female patients with type 2 diabetes, and three non-diabetic female patients.

Code availability

Code: All code used to run the analyses reported here are available at Figshare: https://figshare.com 10.6084/m9.figshare.27066979⁷⁷.

References

Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Article CAS PubMed PubMed Central Google Scholar
Farh, K. K. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Article CAS PubMed Google Scholar
Pennisi, E. The biology of genomes. disease risk links to gene regulation. Science 332, 1031 (2011).
Article PubMed Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl Acad. Sci. 106, 9362–9367 (2009).
Article CAS PubMed PubMed Central Google Scholar
Pickrell, J. K. Joint analysis of functional genomic data and genome-wide association studies of 18 human traits. Am. J. Hum. Genet. 94, 559–573 (2014).
Article CAS PubMed PubMed Central Google Scholar
Welter, D. et al. The NHGRI GWAS Catalog, a curated resource of SNP-trait associations. Nucleic Acids Res. 42, D1001–1006 (2014).
Article CAS PubMed Google Scholar
Li, Y. I. et al. RNA splicing is a primary link between genetic variation and disease. Science 352, 600–604 (2016).
Article CAS PubMed PubMed Central Google Scholar
Zhou, D. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat. Genet. 52, 1239–1246 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 1091–1098 (2015).
Article CAS PubMed PubMed Central Google Scholar
Zhu, Z. et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48, 481–487 (2016).
Article CAS PubMed Google Scholar
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Article CAS PubMed PubMed Central Google Scholar
Keller, M. P. et al. Genetic drivers of pancreatic islet function. Genetics 209, 335–356 (2018).
Article CAS PubMed PubMed Central Google Scholar
Crouse, W. L., Keele, G. R., Gastonguay, M. S., Churchill, G. A. & Valdar, W. A Bayesian model selection approach to mediation analysis. PLoS Genet. 18, e1010184 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chick, J. M. et al. Defining the consequences of genetic variation on a proteome-wide scale. Nature 534, 500–505 (2016).
Article CAS PubMed PubMed Central Google Scholar
Wheeler, H. E. et al. Imputed gene associations identify replicable trans-acting genes enriched in transcription pathways and complex traits. Genet. Epidemiol. 43, 596–608 (2019).
Article PubMed PubMed Central Google Scholar
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet 37, 109–124 (2021).
Article CAS PubMed Google Scholar
Connally, N. J. et al. The missing link between genetic association and regulatory function. Elife 11, e74970 (2022).
Article CAS PubMed PubMed Central Google Scholar
Mostafavi, H., Spence, J. P., Naqvi, S. & Pritchard, J. K. Systematic differences in discovery of genetic effects on gene expression and complex traits. Nat. Genet. 55, 1866–1875 (2023).
Article CAS PubMed Google Scholar
Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).
Article CAS PubMed PubMed Central Google Scholar
Liu, X. et al. GBAT: a gene-based association test for robust detection of trans-gene regulation. Genome Biol. 21, 211 (2020).
Article CAS PubMed PubMed Central Google Scholar
Westra, H. J. et al. Systematic identification of trans eQTLs as putative drivers of known disease associations. Nat. Genet. 45, 1238–1243 (2013).
Article CAS PubMed PubMed Central Google Scholar
Gilad, Y., Rifkin, S. A. & Pritchard, J. K. Revealing the architecture of gene regulation: the promise of eQTL studies. Trends Genet. 24, 408–415 (2008).
Article CAS PubMed PubMed Central Google Scholar
Nathan, A. et al. Single-cell eQTL models reveal dynamic T cell state dependence of disease loci. Nature 606, 120–128 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sollis, E. et al. The NHGRI-EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
Article CAS PubMed Google Scholar
Loos, R. J. F. & Yeo, G. S. H. The genetics of obesity: from discovery to biology. Nat. Rev. Genet. 23, 120–133 (2022).
Article CAS PubMed Google Scholar
Singh, R. K., Kumar, P. & Mahalingam, K. Molecular genetics of human obesity: a comprehensive review. C. R. Biol. 340, 87–108 (2017).
Article PubMed Google Scholar
Arner, P. Obesity–a genetic disease of adipose tissue? Br. J. Nutr. 83, 9–16 (2000).
Article Google Scholar
Keller, M. P. et al. Gene loci associated with insulin secretion in islets from nondiabetic mice. J. Clin. Investig. 129, 4419–4432 (2019).
Article PubMed PubMed Central Google Scholar
Churchill, G. A., Gatti, D. M., Munger, S. C. & Svenson, K. L. The diversity outbred mouse population. Mamm. Genome 23, 713–718 (2012).
Article PubMed PubMed Central Google Scholar
Chesler, E. J. et al. The collaborative cross at Oak Ridge National Laboratory: developing a powerful resource for systems genetics. Mamm. Genome 19, 382–389 (2008).
Article PubMed PubMed Central Google Scholar
Saul, M. C., Philip, V. M., Reinholdt, L. G. & Chesler, E. J. High-diversity mouse populations for complex traits. Trends Genet. 35, 501–514 (2019).
Article CAS PubMed PubMed Central Google Scholar
Threadgill, D. W., Miller, D. R., Churchill, G. A. & de Villena, F. P. The collaborative cross: a recombinant inbred mouse population for the systems genetic era. ILAR J. 52, 24–31 (2011).
Article CAS PubMed Google Scholar
Clee, S. M. & Attie, A. D. The genetic landscape of type 2 diabetes in mice. Endocr. Rev. 28, 48–83 (2007).
Article CAS PubMed Google Scholar
Broman, K. W. et al. R/qtl2: software for mapping quantitative trait loci with high-dimensional data and multiparent populations. Genetics 211, 495–502 (2019).
Article CAS PubMed PubMed Central Google Scholar
Ouwens, K. G. et al. A characterization of cis-and trans-heritability of RNA-seq-based gene expression. Eur. J. Hum. Genet. 28, 253–263 (2020).
Article CAS PubMed Google Scholar
Price, A. L. et al. Single-tissue and cross-tissue heritability of gene expression via identity-by-descent in related or unrelated individuals. PLoS Genet. 7, e1001317 (2011).
Article CAS PubMed PubMed Central Google Scholar
Bryois, J. et al. Cis and trans effects of human genomic variants on gene expression. PLoS Genet. 10, e1004461 (2014).
Article PubMed PubMed Central Google Scholar
Helmer, M. et al. On the stability of canonical correlation analysis and partial least squares with application to brain-behavior associations. Commun. Biol. 7, 217 (2024).
Article CAS PubMed PubMed Central Google Scholar
Korotkevich, G., Sukhov, V. and Sergushichev, A. Fast gene set enrichment analysis. Preprint at bioRxiv https://doi.org/10.1101/060012 (2019).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Article CAS PubMed PubMed Central Google Scholar
Subramanian, S. & Chait, A. The effect of dietary cholesterol on macrophage accumulation in adipose tissue: implications for systemic inflammation and atherosclerosis. Curr. Opin. Lipido. 20, 39–44 (2009).
Article CAS Google Scholar
Akoumianakis, I., Akawi, N. & Antoniades, C. Exploring the crosstalk between adipose tissue and the cardiovascular system. Korean Circ. J. 47, 670–685 (2017).
Article CAS PubMed PubMed Central Google Scholar
Stafeev, I. S., Vorotnikov, A. V., Ratner, E. I., Menshikov, M. Y. & Parfyonova, Y. V. Latent inflammation and insulin resistance in adipose tissue. Int J. Endocrinol. 2017, 5076732 (2017).
Article CAS PubMed PubMed Central Google Scholar
Fischer, I. P. et al. A history of obesity leaves an inflammatory fingerprint in liver and adipose tissue. Int J. Obes. (Lond.) 42, 507–517 (2018).
Article CAS PubMed Google Scholar
Chung, S. et al. Dietary cholesterol promotes adipocyte hypertrophy and adipose tissue inflammation in visceral, but not in subcutaneous, fat in monkeys. Arterioscler Thromb. Vasc. Biol. 34, 1880–1887 (2014).
Article CAS PubMed PubMed Central Google Scholar
Kus, V. et al. Induction of muscle thermogenesis by high-fat diet in mice: association with obesity-resistance. Am. J. Physiol. Endocrinol. Metab. 295, E356–367 (2008).
Article CAS PubMed Google Scholar
Newgard, C. B. Interplay between lipids and branched-chain amino acids in development of insulin resistance. Cell Metab. 15, 606–614 (2012).
Article CAS PubMed PubMed Central Google Scholar
Sears, D. D. et al. Mechanisms of human insulin resistance and thiazolidinedione-mediated insulin sensitization. Proc. Natl Acad. Sci. USA 106, 18745–18750 (2009).
Article CAS PubMed PubMed Central Google Scholar
Xu, Ren-ying, Wan, Yan-ping, Tang, Qing-ya, Wu, J. & Cai, W. The effects of high fat on central appetite genes in Wistar rats: a microarray analysis. Clin. Chim. Acta 397, 96–100 (2008).
Article CAS PubMed Google Scholar
Stienstra, R., Duval, C., ller, M. & Kersten, S. PPARs, obesity, and inflammation. PPAR Res. 2007, 95974 (2007).
Article PubMed Google Scholar
Gavrilova, O. et al. Liver peroxisome proliferator-activated receptor gamma contributes to hepatic steatosis, triglyceride clearance, and regulation of body fat mass. J. Biol. Chem. 278, 34268–34276 (2003).
Article CAS PubMed Google Scholar
Matsusue, K. et al. Liver-specific disruption of PPARgamma in leptin-deficient mice improves fatty liver but aggravates diabetic phenotypes. J. Clin. Invest. 111, 737–747 (2003).
Article CAS PubMed PubMed Central Google Scholar
Patsouris, D., Reddy, J. K., ller, M. & Kersten, S. Peroxisome proliferator-activated receptor alpha mediates the effects of high-fat diet on hepatic gene expression. Endocrinology 147, 1508–1516 (2006).
Article CAS PubMed Google Scholar
Schadinger, S. E., Bucher, N. L., Schreiber, B. M. & Farmer, S. R. PPARgamma2 regulates lipogenesis and lipid accumulation in steatotic hepatocytes. Am. J. Physiol. Endocrinol. Metab. 288, E1195–1205 (2005).
Article CAS PubMed Google Scholar
Motomura, W. et al. Up-regulation of ADRP in fatty liver in human and liver steatosis in mice fed with high fat diet. Biochem Biophys. Res. Commun. 340, 1111–1118 (2006).
Article CAS PubMed Google Scholar
Srivastava, A. et al. Genomes of the mouse collaborative cross. Genetics 206, 537–556 (2017).
Article CAS PubMed PubMed Central Google Scholar
Roberts, A., Pardo-Manuel de Villena, F., Wang, W., McMillan, L. & Threadgill, D. W. The polymorphism architecture of mouse genetic resources elucidated using genome-wide resequencing data: implications for QTL discovery and systems genetics. Mamm. Genome 18, 473–481 (2007).
Article CAS PubMed PubMed Central Google Scholar
Churchill, G. A. et al. The collaborative cross, a community resource for the genetic analysis of complex traits. Nat. Genet. 36, 1133–1137 (2004).
Article CAS PubMed Google Scholar
Huh, J. Y., Park, Y. J., Ham, M. & Kim, J. B. Crosstalk between adipocytes and immune cells in adipose tissue inflammation and metabolic dysregulation in obesity. Mol. Cells 37, 365–371 (2014).
Article PubMed PubMed Central Google Scholar
Lamb, J. et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 313, 1929–1935 (2006).
Article CAS PubMed Google Scholar
Subramanian, A. et al. A next generation connectivity map: L1000 platform and the first 1,000,000 profiles. Cell 171, 1437–1452 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vosa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hallgrimsson, B. et al. The developmental-genetics of canalization. Semin Cell Dev. Biol. 88, 67–79 (2019).
Article PubMed Google Scholar
Siegal, M. L. & Bergman, A. Waddington’s canalization revisited: developmental stability and evolution. Proc. Natl Acad. Sci. USA 99, 10528–10532 (2002).
Article CAS PubMed PubMed Central Google Scholar
Paaby, A. B. & Gibson, G. Cryptic genetic variation in evolutionary developmental genetics. Biol. (Basel) 5, 28 (2016).
Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell 173, 1573–1580 (2018).
Article CAS PubMed Google Scholar
Shimizu, H. & Osaki, A. Nesfatin/Nucleobindin-2 (NUCB2) and Glucose Homeostasis. Curr Hypertens Rev 9, 270–273 (2013).
Nakata, M. & Yada, T. Role of NUCB2/nesfatin-1 in glucose control: diverse functions in islets, adipocytes and brain. Curr. Pharm. Des. 19, 6960–6965 (2013).
Article CAS PubMed Google Scholar
Riva, M. et al. Nesfatin-1 stimulates glucagon and insulin secretion and beta cell NUCB2 is reduced in human type 2 diabetic subjects. Cell Tissue Res. 346, 393–405 (2011).
Article CAS PubMed Google Scholar
Van Belle, G. Statistical Rules of Thumb. 2nd edition (John Wiley & Sons, 2008).
Morgan, A. P. et al. The mouse universal genotyping array: from substrains to subspecies. G3 (Bethesda) 6, 263–279 (2015).
Article PubMed PubMed Central Google Scholar
Svenson, K. L. et al. High-resolution genetic mapping using the mouse diversity outbred population. Genetics 190, 437–447 (2012).
Article CAS PubMed PubMed Central Google Scholar
Gatti, D. M. et al. Quantitative trait locus mapping methods for diversity outbred mice. G3 (Bethesda) 4, 1623–1633 (2014).
Article PubMed Google Scholar
Choi, K. et al. Genotype-free individual genome reconstruction of multiparental population models by rna sequencing data. bioRxiv pages 2020–10 (2020).
Tyler, A.L. et al. Data and code for high-dimensional mediation analysis (HDMA) in diversity outbred mice. Figshare https://doi.org/10.6084/m9.figshare.27066979 (2024).
Visscher, P. M. & Goddard, M. E. A general unified framework to assess the sampling variance of heritability estimates using pedigree or marker-based relationships. Genetics 199, 223–32 (2015).
Article PubMed Google Scholar
Keller, M. P. et al. Data from: genetic drivers of pancreatic islet function. https://doi.org/10.5061/dryad.pj105 (2018).
Raghupathy, N. et al. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics 34, 2177–2184 (2018).
Article CAS PubMed PubMed Central Google Scholar
Munger, S. C. et al. RNA-Seq alignment to individualized genomes improves transcript abundance estimates in multiparent populations. Genetics 198, 59–73 (2014).
Article PubMed PubMed Central Google Scholar
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. J. Stat. Softw. 67, 1–48 (2015).
Article Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with deseq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Churchill, G. A. & Doerge, R. W. Empirical threshold values for quantitative trait mapping. Genetics 138, 963–971 (1994).
Article CAS PubMed PubMed Central Google Scholar
Bollen, K. A. Structural Equations With Latent Variables. John Wiley & Sons, (2014).
Tenenhaus, A. & Tenenhaus, M. Regularized generalized canonical correlation analysis. Psychometrika 76, 257–284 (2011).
Article MathSciNet Google Scholar
Tenenhaus, M., Tenenhaus, A. & Groenen, Patrick J. F. Regularized generalized canonical correlation analysis: a framework for sequential multiblock component methods. Psychometrika 82, 737–777 (2017).
Article MathSciNet PubMed Google Scholar
Tenenhaus, A., Philippe, C. & Frouin, V. Kernel generalized canonical correlation analysis. Computational Stat. Data Anal. 90, 114–131 (2015).
Article MathSciNet Google Scholar
Girka, F. et al. Multiblock data analysis with the RGCCA package. J. Stat. Softw. 1–36 (2023).
Schäfer, J. & Strimmer, K. A shrinkage approach to large-scale covariance matrix estimation and implications for functional genomics. Stat. Appl. Genet. Mol. Biol. 4, 1 (2005).
Kanehisa, M., Furumichi, M., Sato, Y., Kawashima, M. & Ishiguro-Watanabe, M. KEGG for taxonomy-based analysis of pathways and genomes. Nucleic Acids Res. 51, D587–D592 (2023).
Article CAS PubMed Google Scholar
Luo, W. & Brouwer, C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics 29, 1830–1831 (2013).
Article CAS PubMed PubMed Central Google Scholar
Blake, J. A. et al. Mouse Genome Database (MGD): Knowledgebase for mouse-human comparative biology. Nucleic Acids Res. 49, D981–D987 (2021).
Article CAS PubMed Google Scholar
Gentry, J. annotate: Annotation for microarrays R package version 1.82.0 (2024).
Gatto, L., Breckels, L. M., Wieczorek, S., Burger, T. & Lilley, K. S. Mass-spectrometry-based spatial proteomics data analysis using pRoloc and pRolocdata. Bioinformatics 30, 1322–1324 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fantini, D. easyPubMed: Search and Retrieve Scientific Publication Records from PubMed, R package version 2.13 (2019).
Maechler, M., Rousseeuw, P., Struyf, A., Hubert, M. & Hornik, K. cluster: Cluster Analysis Basics and Extensions. R package version 2.1.6 — For new features, see the ’NEWS’ and the ’Changelog’ file in the package source) (2023).
Kolberg, L., Raudvere, U., Kuzmin, I., Vilo, J. & Peterson, H. gprofiler2– an r package for gene list functional enrichment analysis and namespace conversion toolset g:profiler. F1000Research 9 (ELIXIR)(709), R package version 0.2.3 (2020).
Ehrlund, A. et al. The cell-type specific transcriptome in human adipose tissue and influence of obesity on adipocyte progenitors. Sci. Data 4, 170164 (2017).
Article CAS PubMed PubMed Central Google Scholar
Elgamal, R. M. et al. An integrated map of cell type-specific gene expression in pancreatic islets. Diabetes 72, 1719–1728 (2023).
Clough, E. et al. NCBI GEO: archive for gene expression and epigenomics data sets: 23-year update. Nucleic Acids Res. 52, D138–D144 (2024).
Article CAS PubMed Google Scholar
Edgar, R., Domrachev, M. & Lash, A. E. Gene expression omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30, 207–210 (2002).
Article CAS PubMed PubMed Central Google Scholar
Davis, S. & Meltzer, P. Geoquery: a bridge between the gene expression omnibus (geo) and bioconductor. Bioinformatics 14, 1846–1847 (2007).
Article Google Scholar
Baldarelli, R. M. et al. Mouse genome informatics: an integrated knowledgebase system for the laboratory mouse. Genetics 227, iyae031 (2024).
Article PubMed PubMed Central Google Scholar
Kuznetsova, A., Brockhoff, P. B. & Christensen, Rune H. B. lmertest package: tests in linear mixed effects models. J. Stat. Softw. 82, 1–26 (2017).
Article Google Scholar
Tyler, A.L. et al. Data for figures in high-dimensional mediation analysis (HDMA) in diversity outbred mice. Figshare https://doi.org/10.6084/m9.figshare.29247428 (2025).

Download references

Acknowledgements

This project was supported by The Jackson Laboratory Cube Initiative, as well as grants from the National Institutes of Health (grant numbers R01DK101573, R01DK102948, and RC2DK125961 to A. D. A., R01GM115518 to G. W. C., and R01GM141309 to J.M.M- and by the University of Wisconsin-Madison, Department of Biochemistry and Office of the Vice Chancellor for Research and Graduate Education with funding from the Wisconsin Alumni Research Foundation (to M. P. K.). We thank the following scientific services at The Jackson Laboratory: Genome Technologies for the RNA sequencing, necropsy services for the tissue harvests, and the Center for Biometric Analysis for metabolic phenotyping.

Author information

These authors contributed equally: Anna L. Tyler, J. Matthew Mahoney.

Authors and Affiliations

The Jackson Laboratory, Bar Harbor, Maine, USA
Anna L. Tyler, J. Matthew Mahoney, Candice N. Baker, Margaret Gaca, Isabela Gerdes Gyuricza, Madeleine J. Braun, Nadia A. Rosenthal, Gary A. Churchill & Gregory W. Carter
University of Wisconsin-Madison, Biochemistry Department, Madison, WI, USA
Mark P. Keller & Alan D. Attie
The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
Anuj Srivastava
National Heart and Lung Institute, Imperial College, London, UK
Nadia A. Rosenthal

Authors

Anna L. Tyler
View author publications
Search author on:PubMed Google Scholar
J. Matthew Mahoney
View author publications
Search author on:PubMed Google Scholar
Mark P. Keller
View author publications
Search author on:PubMed Google Scholar
Candice N. Baker
View author publications
Search author on:PubMed Google Scholar
Margaret Gaca
View author publications
Search author on:PubMed Google Scholar
Anuj Srivastava
View author publications
Search author on:PubMed Google Scholar
Isabela Gerdes Gyuricza
View author publications
Search author on:PubMed Google Scholar
Madeleine J. Braun
View author publications
Search author on:PubMed Google Scholar
Nadia A. Rosenthal
View author publications
Search author on:PubMed Google Scholar
Alan D. Attie
View author publications
Search author on:PubMed Google Scholar
Gary A. Churchill
View author publications
Search author on:PubMed Google Scholar
Gregory W. Carter
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: J.M.M., A.D.A., N.A.R., MJB, G.A.C., G.W.C. Methodology: G.A.C., A.D.A., J.M.M., C.N.B., M.P.K. Software: A.L.T., J.M.M., I.G.G., A.S. Formal Analysis: A.L.T., I.G.G., A.S. Investigation: C.N.B., M.G. Resources: A.D.A., M.P.K., C.N.B., NAR, MJB, G.A.C., G.W.C. Data Curation: C.N.B., A.L.T. Writing - Original Draft: A.L.T, J.M.M. Writing - Review and Editing: A.L.T., J.M.M, G.A.C, G.W.C., M.P.K., C.N.B. Visualization: A.L.T. Supervision: M.J.B., N.A.R., A.D.A., G.A.C., G.W.C. Project Administration: M.J.B., C.N.B. Funding acquisition: N.A.R., A.D.A., G.A.C., G.W.C.

Corresponding authors

Correspondence to Gary A. Churchill or Gregory W. Carter.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks Richard Mott, Leah Woods and Dan Zhou for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Supplementary Data 2

Reporting Summary

Transparent Peer Review file

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Tyler, A.L., Mahoney, J.M., Keller, M.P. et al. Transcripts with high distal heritability mediate genetic effects on complex metabolic traits. Nat Commun 16, 5507 (2025). https://doi.org/10.1038/s41467-025-61228-9

Download citation

Received: 05 November 2024
Accepted: 17 June 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41467-025-61228-9

Subjects

Abstract

Similar content being viewed by others

Introduction

Results

Genetic variation contributed to wide phenotypic variation

Distal heritability correlated with phenotype relevance

High-Dimensional Mediation Analysis identified a high-heritability composite trait that was mediated by a composite transcript

Body weight and insulin resistance were highly represented in the expression-mediated composite trait

High-loading transcripts had low local heritability, high distal heritability, and were linked mechanistically to obesity

Tissue-specific transcriptional programs were associated with metabolic traits

Gene expression, but not local eQTLs, predicted body weight in an independent population

Distally heritable transcriptomic signatures suggested variation in composition of adipose tissue and islets

Heritable transcriptomic signatures translated to human disease

Existing therapies are predicted to target mediator gene signatures

Discussion

Methods

Statistics and reproducibility

Diversity outbred mice

Trait measurements

Genotyping

Trait selection in DO

Processed DO data

Collaborative cross recombinant inbred mice (CC-RIX)

CC-RIX genotypes

Clinical chemistries

Intraperitoneal glucose tolerance testing

Bulk tissue collection

Whole pancreas insulin content

RNA isolation and QC

Library construction

Sequencing

Processing of RNA sequencing data

eQTL analysis

Local and distal heritability of transcripts

High-dimensional mediation analysis

Perfect mediation as a constraint on covariance matrices

Projecting multivariate data to identify latent mediators

An algorithm for HDMA

Algorithm 1

Kernel HDMA

Implementation details

Enrichment of biological terms

TWAS in DO mice

Literature support for genes

Tissue-specific clusters

Imputation of gene expression in CC-RIX

Prediction of CC-RIX traits

Cell type specificity

Comparison of transcriptomic signatures to human transcriptomic signatures

Connectivity map queries

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links