Table 2 Tools for genotyping STRs.

From: Rediscovering tandem repeat variation in schizophrenia: challenges and opportunities

STR genotyping tool

Algorithm description

Genotype TRs that exceed the read limit?

Detects TRs not annotated in reference?

Other notes/features

HipSTR [62]

Learns a parametric model that captures each STR’s stutter noise profile. Using the genomic location of the repeat, harnesses this profile and a hidden Markov model (HMM) to realign the STR-containing reads to candidate haplotypes, mitigating the effects of PCR stutter

No

No

Reliability: multiple publications have used HipSTR as singular tool, i.e., reports of case status associations (in ASD Simons Simplex Collection), or eQTL analyses (GTEx); can phase STRs

lobSTR [63]

Signal processing approach that uses rapid entropy measurements to find informative STR reads followed by a Fast Fourier Transform to characterize the repeat sequence

No

No

High error rates noted for dinucleotide repeats

STRetch [69]

Remaps reads anchored in the vicinity of a putative TRE to a synthetic decoy genome containing large expanded repeat arrays; considers reads that map preferentially to synthetic decoy genomes as major criterion in scoring algorithm

Yes

No

Incorporates an outlier statistical method in identifying expansions

gangSTR [70]

Relies on a statistical model incorporating multiple properties of paired-end reads into a single maximum likelihood framework capable of genotyping both normal length and expanded repeats

Yes

No

Uses an exhaustive grid search over all possible allele pairs and returns the maximum likelihood diploid genotype

Expansion Hunter [71]

Sequence-graph-based realignment of reads that originate inside and around each target repeat. Genotypes the length of the repeat in each allele based on these graph alignments

Yes

No

Expansion bias, Repeats with long motifs may gain evidence for expansion

Expansion Hunter DeNovo (102) [76]

Counts number of anchored in-repeat reads (IRRs), which are read pairs in which the first read (the IRR) contains repetitive sequence and the second read (the anchor) contains non-repetitive sequence that can be uniquely mapped to the reference genome

Yes

Yes

TRE must be larger than the sequence read length (>100–150 bp) to be detected

STRling [77]

Performs k-mer counting in DNA sequencing reads, to efficiently recover reads that inform the presence and size of STR expansions

Yes

Yes

Pending replication studies

exSTRa [72]

Generates empirical cumulative distribution functions (ECDFs) of repeat-motif distributions

Yes

No

May be advantageous in WES data

Tredparse [74]

Probabilistic model for predicting STR lengths on the basis of evidence from spanning reads, partial reads, repeat-only reads, and spanning pairs

Yes

No

Does not detect expansions that exceed its detection threshold

superSTR [75]

Uses a fast, compression-based estimator of the information complexity of individual reads to select and process only reads likely to harbor repeat expansions for processing using the linear-time maximal repetition detection algorithm

Yes

Yes

Does not require alignment of raw sequence data

  1. Several publicly available tools for genotyping STRs from whole-genome-sequence data are tabulated, along with notes on the underlying computational algorithm and key features. Each of the tools was developed to analyze short-read-based whole-genome sequence data.