Extended Data Fig. 3: Pan and core genome analysis of 107 upland cotton accessions.

a, Variation of gene families in the pan-genome and core genome with the addition of another cotton genome. b, Composition of the pan-genome. The histograms display the number of gene families in the 107 accessions with varying frequencies. Pie chart indicates the ratio of gene family in each category. c, Presence and absence information of pan gene families across the 107 cotton genomes. d, Gene number of each composition in individual genomes. e, Comparison of nucleotide diversity (π) across core, softcore and dispensable genes. π values were evaluated in a 5 Kb window. The bin numbers for each category are shown below the x-axis. The P-values for Core_vs_Softcore, Core_vs_Dispensable, and Softcore_vs_Dispensable are 0.0625, 1×10−15 and 1×10−15, respectively. f, Comparison of dN/dS ratios across core, softcore and dispensable genes, where dN represents the number of non-synonymous SNP mutations and dS represents the number of synonymous SNP mutations. The P-values for Core_vs_Softcore, Core_vs_Dispensable, and Softcore_vs_Dispensable are 2.4×10−14, 1×10−15 and 1.9×10−14, respectively. g, Comparison of gene expression across core, softcore and dispensable genes. Number n indicates gene number within each category. The P-values for Core_vs_Softcore, Core_vs_Dispensable, and Softcore_vs_Dispensable are 1×10−15, 1×10−15 and 1×10−15, respectively. Multiple comparisons were conducted using the Tukey’s Honestly Significant Difference test with α = 0.05 for panels e-g. Center line, median; box limits, first and third quartiles; whiskers, 1.5 × interquartile range.