Table 1 Technologies for tandem repeats (TRs) sequence analysis based on NGS data

From: STRaM: A genetic framework for improved cell product provenance for research and clinical translations

Software

Bioinformatics Pipeline

Features

Ref.

lobSTR

Flags STR reads, maps the flanking regions to the reference to reveal the STR position and length.

A rapid and accurate algorithm for STR profiling.

73

RepeatSeq

Mapped to reference sequence, discards read that do not span the repeat.

A comprehensive genotyping software package for calling microsatellite repeat genotypes.

74

STRait Razor

Use AGREP function, an approximate string search tool to count the nucleotide number of each repeat sequence, and flanking region query each 12 bases long.

A Perl script for identifying alleles at forensic STR loci.

21

PacmonSTR

Alignment to reference genome. TR estimates by a pair-hidden Markov models, prediction genotyping and boundaries for compound structural variants within or around a TR interval.

A reference-based probabilistic approach to identify the TR regions and estimate the number of these TR elements.

48

STRViper

Align to a reference sequence for STR length, need paired-end reads aligned to the flanking regions.

A Bayesian method to estimate repeat-length variations.

75

TRhist

Retrieve STRs by an approximate algorithm. Maps to a unique position for location located in the genome.

An ab initio procedure for sensing, locating and sequencing STRs that were significantly expanded.

76

VNTRseek

Mapped to reference TRs for calling and then, mappings to reference flanking sequences for confirmations.

A software that identified internal copy number variation at minisatellite TR loci.

77

TSSV

Alignment of flanking pair of markers at predefined loci via a semiglobal alignment (25 bp).

An efficient and sensitive tool to specifically profile all allelic variants present in targeted STR loci.

78

STR-FM

A string comparison algorithm for STR analysis, and 20 bp flanking sequence mapped to reference genome.

A computational pipeline that detected the full spectrum of STR alleles.

19

CoalescentSTR

Aligned to reference genome for STR length, and read needs spanning the STR regions.

A new statistical model that estimated repeat numbers.

79

STR-realigner

A new dynamic programming-based realignment method for reference genome, and read needs spanning the STR regions.

A new realignment method for STR regions.

40

STRinNGS

Comparing the reference sequences allele calling and variations in flanking regions.

A Python script for the analysis of STR regions.

43

PopSTR

Mapped to the reference genome, and flanking region aligned to the reference genome ( > 4 base).

A method capable of studying microsatellite (STR) variation.

44

STRait Razor V3.0

Performs fuzzy string matching for motif length.

A novel indexing strategy used to perform fuzzy string matching of anchor sequences.

20

HipSTR

Selects variation in STR and identify sequence variations.

A novel haplotype-based method for genotyping and phasing STRs.

11

TREDPARSE

Build a series of STR-region references, and align reads to reference for repeat size. Requires at least 9 bp when matching flanking sequences.

A software package that incorporated various cues from read alignment and paired-end distance distribution, as well as, a sequence stutter model in a probabilistic framework to infer repeat sizes for genetic loci.

80

ExpansionHunter

Repeat size determination from spanning reads, identifying IRRs and off-target regions. The flanking sequences are aligned to the reference.

A software package to genotype STRs.

41

STRetch

Generating a custom reference genome for STR length and positions.

A new genome-wide method to scan for STR expansions.

42

exSTRa

Mapping to reference loci.

A method to identify repeat expansions.

81

toaSTR

Use a fast k-mer-based fuzzy search for clustered observations. Reads must span the complete repeat region, and a minimum of 30 nucleotides upstream and downstream of the repeat region.

A web application to help forensic experts work with MPS data in a simple and efficient way.

46

GangSTR

Use tandem repeat finder to establish reference STR library. Determine STR length and off-target reads for reads fully enclosing the TR plus a minimum of 20 bp on either end.

A novel algorithm for genome-wide genotyping of both short and expanded TRs.

45

STRinNGS v2.0

Reading the reference file to get information of all loci. Extracts the reference flank sequences for variant(s).

An updated version (2.0) of the STR analysis tool.

82

SuperSTR

A fast, compression-based estimator to identify reads with motifs.

An ultrafast method that does not require alignment, efficient screening and identification of known and potential disease-associated STRs.

83

STRling

Scans candidate reads for k-mer content. A pair of reads has one read that maps well to the reference genome and a mate with high STR content, the mapping position of the well-mapped read is used to reposition the STR read.

Used the k-mer counting to detect STR expansions.

49

SNiPSTR

Primarily aligned to reference sequence for length and to reference genome for flanking variants. After flanking alignment, obtain STR motifs.

A combined cost-efficient shallow-sequence output NGS assay and a dedicated bioinformatics pipeline.

47

STRaM (current work)

First pipeline: STR is recognized by STR-FM for genomic coordinates, length and read counts, etc. Second pipeline: reads mapped to reference sequence for genomic coordinates, length and read counts, etc. Comparing information of STR for error checks and third pipeline for target analysis.

An integrated and cross-checked workflow for STR analysis and targeted sequences combined with an evaluation system for sample monitoring.