Fig. 3: TCAF gene and isoform diversity among human haplotypes.

a Miropeats analysis reveals great consistency between Haplogroup 5 and the gapped 7q35 TCAF locus in the human reference genome (GRCh38). Colored arrows are annotated TCAF SDs and lines connecting the sequences show regions of homology. The dark green rectangle in the sequence-resolved, gap-free Haplogroup 5 contig (bottom) represents the region missing in the GRCh38 sequence (dark green bar, top). Additional annotations include a schematic of TCAF sequence structure and gene models/isoforms for nonhuman primate, RefSeq gene track (GRCh38), segmental duplication (SegDup) tracks, and predicted TCAF models and isoforms using full-length non-chimeric transcripts from six human tissues (Methods). Note that the TCAF2A gene model in GRCh38 is incorrect due to the presence of the gap in the middle of the TCAF SD region. For illustration, all predicted gene models and isoforms are aligned to Haplogroup 5 sequence for b TCAF2A and c TCAF2C, along with annotations for amino acid (aa) sequence lengths and haplogroups in which they are observed. For simplicity, we skip sequences between TCAF2A and TCAF2C2 (dashed line). Detailed haplotype-specific and/or tissue-specific gene models and isoforms can be found in Supplementary Figs. 6–13 and Supplementary Data 5.