Fig. 1: Whole-exome sequencing data analysis strategy.
From: A novel autism-associated UBLCP1 mutation impacts proteasome regulation/activity

a F2 Family pedigree and segregation of the UBLCP1 variant g.158,710,261CAAAG > C: open circles and squares represent unaffected females and males, respectively; closed square represents affected male homozygous for the variant; half-filled circles and squares denote unaffected heterozygous carriers. b WES data analysis and filtering strategy used to prioritize identified variants in family F2. The workflow diagram illustrates the number of variants identified at each step (gradient) of the filtering process, assuming a homozygous recessive model of inheritance. A brief description of each step of the filtering process (steps 1–6) is shown at the left of the diagram. Briefly, after alignment and post-alignment processing and the removal of variants with missing genotype data or multiple alleles, a total number of variants was identified across sequenced exomes. Step 1 involved mining for variants with a minor allele frequency (MAF) ≤ 1% using data from the 1000 Genomes project (1000 G), Genome Aggregation Database (gnomAD), and Greater Middle East Variome Project (GME). Step 2 involved excluding all synonymous and UTR variants. Step 3 excluded polymorphic variants found in dbSNP. Step 4 involved selecting variants with global mean allele frequency ≤1% in ExAc and Iranome databases. Step 5 involved confirming by Sanger sequencing the genotypes in all family members, and keeping variants with proper segregation within the family. Step 6 relied on genotyping of candidate variants in a local database consisting of 100 Lebanese controls and keeping variants with an MAF ≤ 1%. Steps 1–6 led to prioritization of the variant in the candidate gene UBLCP1.