Fig. 4: S. robusta within-species variability using a gene-based pan-genome analysis. | Nature Communications

Fig. 4: S. robusta within-species variability using a gene-based pan-genome analysis.

From: The Seminavis robusta genome provides insights into the evolutionary adaptations of benthic diatoms

Fig. 4

a Representation of reference, core and pan gene size. The size of pan genome increases with each added strain up to 37,803 protein-coding genes, whereas the size of core genome diminishes to 28,120 protein-coding genes. Clade category color code refers to the population groups described in ref. 14. b Number of core and dispensable genes per S. robusta strain. The pie chart shows the total gene count, where core genes are genes present in all strains, dispensable genes are genes present in a subset of strains. c Percentage of gene length coverage by short read for all pan genes for each strain. The x axis represents the S. robusta strains, whereas the y axis represents all protein-coding pan genes. The percentage of horizontal gene coverage is highlighted by a color gradient from white (0%) to dark purple (100%). Gene categories are labeled on the right side of the y axis following the color code of panel b, whereas clade categories are labeled on the upper part of x axis following the color code of panel a. d Set of gene families that are significantly enriched in core genes. The x axis represents the percentage of protein-coding pan genes that are core or dispensable, following the color code of panel b, while the y axis represents gene families, denoting in parenthesis the total number of pan genes belonging to that gene family (reference and de novo genes). Expansion, tandem enrichment, and age of each family are highlighted in different colors on the right side of the y axis. Numbers in superscript refer to families annotated in Source Data file from Fig. 3. Source data underlying Fig. 4a, b are provided as a Source Data file.

Back to article page