Extended Data Fig. 1: Histone classification and evolution.
From: A phylogenetic and proteomic reconstruction of eukaryotic chromatin evolution

a, Primary and secondary alignments of histone-fold containing proteins classified as canonical H2A, H2B, H3 and H4, based on identity to reference sequences in HistoneDB. Pie plots represent the number of alignments to HistoneDB-annotated sequences, for the entire dataset (prokaryotic, eukary-otic and viral sequences, large pie plots in the inset) and the eukaryotic subset (smaller plots in the inset). For those proteins that align to more than one canonical histone or major variant (macroH2A, H2A.Z or cenH3), the scatter plots represent the relative identity between the primary (horizontal axis) and secondary alignment(s) (vertical axis). b, Aggregated counts of histone gene pairs, classified ac-cording to histone type and orientation. c, Presence of histone variants (left) and number of collinear pairs of histone-encoding genes (right) per species, classified according to their histone types and rela-tive orientation (head-to-head, hh; head-to-tail, ht; and tail-to-tail, tt). Source data available in Supple-mentary Data 2. Histone variant classification is based on the highest-scoring HMM profile from His-toneDB. Asterisks colors in the macroH2A column indicate species where histone-less Macro do-mains orthologous to the macroH2A genes are found (see panel d). Lighter colors in the variant classi-fication indicate ambiguously classified histones (i.e. cases in which the highest-scoring HMM profile exhibited a low bitscore, defined as a probability below 0.05 in the profile-wise distribution function of scaled bitscores; or cases in which the first-to-second ratio between high scoring profiles was below 1.01). d, Alignments of putatively conserved histone N-tails in archaea. Conserved amino-acids are color-coded according to chemical properties. Dots next to species names are color-coded according to taxonomy (same as Fig. 2c). e, Phylogenetic analysis of the Macro motif of macroH2A histones across eukaryotes, highlighting the macroH2A ortholog group (green), and, within this group, Macro-containing genes lacking histone domains (orange), and their protein domain architectures.