Translational Psychiatry

Table 2 Tools for genotyping STRs.

From: Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities

STR genotyping tool	Algorithm description	Genotype TRs that exceed the read limit?	Detects TRs not annotated in reference?	Other notes/features
HipSTR [62]	Learns a parametric model that captures each STR’s stutter noise profile. Using the genomic location of the repeat, harnesses this profile and a hidden Markov model (HMM) to realign the STR-containing reads to candidate haplotypes, mitigating the effects of PCR stutter	No	No	Reliability: multiple publications have used HipSTR as singular tool, i.e., reports of case status associations (in ASD Simons Simplex Collection), or eQTL analyses (GTEx); can phase STRs
lobSTR [63]	Signal processing approach that uses rapid entropy measurements to find informative STR reads followed by a Fast Fourier Transform to characterize the repeat sequence	No	No	High error rates noted for dinucleotide repeats
STRetch [69]	Remaps reads anchored in the vicinity of a putative TRE to a synthetic decoy genome containing large expanded repeat arrays; considers reads that map preferentially to synthetic decoy genomes as major criterion in scoring algorithm	Yes	No	Incorporates an outlier statistical method in identifying expansions
gangSTR [70]	Relies on a statistical model incorporating multiple properties of paired-end reads into a single maximum likelihood framework capable of genotyping both normal length and expanded repeats	Yes	No	Uses an exhaustive grid search over all possible allele pairs and returns the maximum likelihood diploid genotype
Expansion Hunter [71]	Sequence-graph-based realignment of reads that originate inside and around each target repeat. Genotypes the length of the repeat in each allele based on these graph alignments	Yes	No	Expansion bias, Repeats with long motifs may gain evidence for expansion
Expansion Hunter DeNovo (102) [76]	Counts number of anchored in-repeat reads (IRRs), which are read pairs in which the first read (the IRR) contains repetitive sequence and the second read (the anchor) contains non-repetitive sequence that can be uniquely mapped to the reference genome	Yes	Yes	TRE must be larger than the sequence read length (>100–150 bp) to be detected
STRling [77]	Performs k-mer counting in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions	Yes	Yes	Pending replication studies
exSTRa [72]	Generates empirical cumulative distribution functions (ECDFs) of repeat-motif distributions	Yes	No	May be advantageous in WES data
Tredparse [74]	Probabilistic model for predicting STR lengths on the basis of evidence from spanning reads, partial reads, repeat-only reads, and spanning pairs	Yes	No	Does not detect expansions that exceed its detection threshold
superSTR [75]	Uses a fast, compression-based estimator of the information complexity of individual reads to select and process only reads likely to harbor repeat expansions for processing using the linear-time maximal repetition detection algorithm	Yes	Yes	Does not require alignment of raw sequence data

Several publicly available tools for genotyping STRs from whole-genome-sequence data are tabulated, along with notes on the underlying computational algorithm and key features. Each of the tools was developed to analyze short-read-based whole-genome sequence data.

Back to article page

Search

Advanced search

Quick links