Extended Data Fig. 5: Sequence features for polymorphic MEIs annotated using SVAN.
From: Structural variation in 1,019 diverse humans based on long-read sequencing

a) At the top we depict a schematic representation of all possible sequence features for canonical L1 insertion conformations, shown as colored boxes. Features include poly-A tails (A(n)) and transductions (TD). Conformations are grouped based on their likely mechanism of origin: target-primed reverse transcription (TPRT) and twin priming (TP). At the bottom left, frequencies of each canonical L1 insertion conformation, where each conformation is defined by a unique combination of the sequence features shown in the schematic. Insertions with configurations inconsistent with TPRT or TP—such as those lacking poly-A tails or containing multiple internal breakpoints—are categorized as non-canonical. At the bottom right, for each L1 insertion conformation, box plots show length distributions of the full insertions and their individual sequence features. Box plots and data points are colored according to the inferred insertion mechanism. b) Stacked dot plots showing alignments of twin priming insertions containing deletions (top) and duplications (bottom) at internal inversion breakpoints. Alignments are colored by orientation, with magenta indicating the inverted L1 sequence. c) Schematic representation of sequence features observed in SVA insertions, along with frequencies of distinct SVA insertion conformations and corresponding length distributions of individual SVA features, shown using the same conventions as for L1 insertions. d, e) Insertion conformations (following the L1 sequence feature colour codes) and length distributions for Alu and processed pseudogene (PSD) insertions.