Fig. 7: Variants predicted to disrupt poly(A)-tail lengthening are selected against in the human population and throughout vertebrates.

a Negative selection among sequenced vertebrate species of variants predicted to be more disruptive of tail lengthening. Shown are distributions of phyloP scores45,46, grouped by the averaged differences in PAL-AI-predicted tail-length changes for individual nucleotides when substituted to each of the three alternative nucleotides. Only positions within the last 100 nt of 3′ UTRs were included in this analysis. For the tail-length difference bin, parentheses indicate values not included, while square brackets indicate values included. Box and whiskers indicate the 10th, 25th, 50th, 75th, and 90th percentiles. Statistical test results are reported in Supplementary Fig. 9a and Supplementary Data 5. b Predicted effects on the tail-length change for human variants. Shown is the PAL-AI-predicted difference in the tail-length change for each variant reported in the All of Us Research Program (exome callset v8) and that of the reference allele, plotted as a function of the variant position in its 3′ UTR. Colors indicate allele frequencies (key). c Depletion in human 3′ UTRs of alleles predicted to disrupt poly(A)-tail lengthening. Shown are relative frequencies of variants, grouped by differences in predicted tail-length change, among cohorts of variants with different allele frequencies (AF) reported in the All of Us Research Program (exome callset v8). Only variants within the last 100 nt of 3′ UTRs were included in this analysis. Binned P values (circles at the top) were from one-sided Fisher’s exact tests performed for each cohort against singletons. Statistical test results are reported in Supplementary Data 5. d Same as (c), but for variants reported in gnomAD v4.1 dataset.