Fig. 2: The wheat core-, shell- and cloud- genome and homoeologous expression patterns.
From: De novo annotation reveals transcriptomic complexity across the hexaploid wheat pan-genome

A UpSet plot showing intersects of orthogroup conservation between cultivars and the relation to their breeding programmes and sowing season. Locations are at the country/state level as cultivars are representative of national breeding programmes. B A representation of CDC Stanley chromosome 3B showing the positions of Canadian-specific genes (top bar), heatmaps showing coverage scores between genes in CDC Stanley and CDC Landmark (middle bar) and coverage scores between CDC Stanley and Norin 61 (bottom bar). Coverage scores are calculated using kmers from each CDC Stanley gene to search the genome of the other cultivar and range from 0 to 1 with values closer to 1 indicating greater similarity. Regions of greater difference are represented in the heatmaps as darker bands. The plot shows a detailed view of the 0–50 Mb region of chromosome 3B (indicated by a red box). The mean of the coverage score between CDC Stanley genes in this region and genes in the non-Canadian lines is plotted. A cluster of four Canadian-specific genes (marked by a red dashed line) lies in a region which is noticeably different between CDC Stanley and the non-Canadian lines potentially representing an introgression. C Percentage of genes belonging to the core-, shell- and cloud- orthologous groups across cultivars. D Violin plots of core, shell and cloud log2 average gene expression across all combined cultivars and tissues, for each subgenome. Internal box plots display the median (centre line), with boxes representing the 25th to 75th percentiles (interquartile range) and whiskers extending to 1.5× the interquartile range. Outliers are not displayed. Pairwise comparisons between categories (core vs shell vs cloud) were performed using two-sided Dunn’s test for multiple comparisons following a Kruskal–Wallis test. Bonferroni correction was applied to adjust p-values for multiple testing. Exact p-values are shown above each comparison. Higher mean expression was observed in core genes across all subgenomes. E Ternary plots, of stable (left) and dynamic (right) 30-let (definition in main text) expression, where there is an homeolog present on each subgenome, of all tissues in all cultivars, combined, showing more overall balanced expression in stable 30-lets and unbalanced expression in dynamic 30-lets. Source data are either provided in an online repository (https://doi.org/10.5281/zenodo.16964999)78 or as a Source Data file (Fig. 2C).