Syndicate content

GangSTR—New Algorithm Applied to Genome-Wide Genotyping of Short Tandem Repeat (STR) Expansions, Such As Those Implicated in Huntington’s Disease, Fragile X Syndrome, & Myotonic Dystrophy

(BY SALLY G. PASION, PhD, Associate Professor of Biology, San Francisco State University). On October 18, at the 2018 American Society for Human Genetics (ASHG) Annual Meeting in San Diego, California ( (October 16-20), software engineer Nima Mousavi, PhD (@nmmsv), in the Electrical and Computer Engineering Department, University of California San Diego (UCSD), in the laboratory of Dr. Melissa Gymrek (, highlighted GangSTR, a novel algorithm for genome-wide profiling of both normal and expanded tandem repeats (TRs). GangSTR provides a new way to identify short tandem repeats (STRs) from next-generation sequencing (NGS) data. STRs are 1-6 base-pair (bp) sequences, repeated in tandem in the genome. Dr. Mousavi’s presentation was one of six that were delivered in a late-morning meeting session (#51) titled ““What Are We Missing? Identification of Previously Underappreciated Mendelian Variants.” The session is described at the following link: Dr. Mousavi’s presentation (#188) was titled “GangSTR: Genome-Wide Genotyping Short Tandem Repeat Expansions” ( STRs exhibit a higher mutation rate compared to insertion-deletions (indels) or single nucleotide polymorphisms (SNPs). Three percent of the human genome contains STRs, and the presence of the repeats may affect the coding region and thus the protein sequence, or it may occur in the non-coding region and affect gene expression. There are STRs that are implicated in trinucleotide repeat diseases such as Huntington’s disease (HD), fragile X syndrome, Friedreich ataxia, spinocerebellar ataxia, and myotonic dystrophy. Consequently, analysis of these STRs is typically targeted to known pathogenic loci. However, in the era of whole genome sequencing, how do we identify potentially pathogenic STRs in an unbiased fashion? A key challenge is that the read length of the NGS datasets is typically less then the length of an STR locus. GangSTR does provide another tool with which to investigate STRs in whole exome sequencing (WES) or whole genome sequencing (WGS) data. Dr. Mousavi reported comparing the efficiency of GangSTR analysis on simulated data sets in comparison to existing STR detection tools. Of particular interest, he and colleagues used real data sets of HD-validated and of “healthy” individuals. GangSTR analysis demonstrated a smaller error rate in comparison to other STR detection tools on identifying pathogenic STRs on over 200 whole exome datasets (validated for HD expansions). GangSTR analysis of an average of nearly 490,000 STRs of the 30X whole genome sequence from a trio, revealed that 98.7% of the calls were consistent with Mendelian inheritance. And finally, in an analysis of 150 genomes from a “healthy” cohort of European, Asian, or African ancestry, GangSTR analysis identified AAAG and AAAGG as the most common STRs and identified 51.9 loci >100 bp in length and 6 loci > 150 bp in length. These observations were consistent with previous reports that A(n)G(m) repeats can promote expansions. One limitation that Dr. Mousavi and colleagues need to contend with is that their analysis excludes homopolymeric tracts. Nonetheless, GangSTR analysis provides the potential for discovery: What is the normal variation of STRs in the human genome? Can this tool be predictive for pathogenic STRs? Can this tool be useful for elucidating a role for STRs in complex traits?

[Presentation abstract] [ASHG Session #51 "What Are We Missing? Identification of Previously Underappreciated Mendelian Variants"] [GangSTR algorithm availability] [GangSTR research availability] [San Francisco State University] [ASHG Annual Meeting 2018]