Table 2 Tools for genotyping STRs.
From: Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities
STR genotyping tool | Algorithm description | Genotype TRs that exceed the read limit? | Detects TRs not annotated in reference? | Other notes/features |
---|---|---|---|---|
HipSTR [62] | Learns a parametric model that captures each STR’s stutter noise profile. Using the genomic location of the repeat, harnesses this profile and a hidden Markov model (HMM) to realign the STR-containing reads to candidate haplotypes, mitigating the effects of PCR stutter | No | No | Reliability: multiple publications have used HipSTR as singular tool, i.e., reports of case status associations (in ASD Simons Simplex Collection), or eQTL analyses (GTEx); can phase STRs |
lobSTR [63] | Signal processing approach that uses rapid entropy measurements to find informative STR reads followed by a Fast Fourier Transform to characterize the repeat sequence | No | No | High error rates noted for dinucleotide repeats |
STRetch [69] | Remaps reads anchored in the vicinity of a putative TRE to a synthetic decoy genome containing large expanded repeat arrays; considers reads that map preferentially to synthetic decoy genomes as major criterion in scoring algorithm | Yes | No | Incorporates an outlier statistical method in identifying expansions |
gangSTR [70] | Relies on a statistical model incorporating multiple properties of paired-end reads into a single maximum likelihood framework capable of genotyping both normal length and expanded repeats | Yes | No | Uses an exhaustive grid search over all possible allele pairs and returns the maximum likelihood diploid genotype |
Expansion Hunter [71] | Sequence-graph-based realignment of reads that originate inside and around each target repeat. Genotypes the length of the repeat in each allele based on these graph alignments | Yes | No | Expansion bias, Repeats with long motifs may gain evidence for expansion |
Expansion Hunter DeNovo (102) [76] | Counts number of anchored in-repeat reads (IRRs), which are read pairs in which the first read (the IRR) contains repetitive sequence and the second read (the anchor) contains non-repetitive sequence that can be uniquely mapped to the reference genome | Yes | Yes | TRE must be larger than the sequence read length (>100–150 bp) to be detected |
STRling [77] | Performs k-mer counting in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions | Yes | Yes | Pending replication studies |
exSTRa [72] | Generates empirical cumulative distribution functions (ECDFs) of repeat-motif distributions | Yes | No | May be advantageous in WES data |
Tredparse [74] | Probabilistic model for predicting STR lengths on the basis of evidence from spanning reads, partial reads, repeat-only reads, and spanning pairs | Yes | No | Does not detect expansions that exceed its detection threshold |
superSTR [75] | Uses a fast, compression-based estimator of the information complexity of individual reads to select and process only reads likely to harbor repeat expansions for processing using the linear-time maximal repetition detection algorithm | Yes | Yes | Does not require alignment of raw sequence data |