Detecting pathogenic STR expansions in next-gen sequencing data — ASN Events

Detecting pathogenic STR expansions in next-gen sequencing data (#111)

Harriet Dashnow 1 , Alicia Oshlack 1
  1. Murdoch Childrens Research Institute, Parkville, VIC, Australia

Short tandem repeats (STRs) are 1-6bp DNA sequences repeated in tandem. STR expansions are known to cause more than 25 Mendelian diseases, most notably Huntington’s disease, spinocerebellar ataxias, and the fragile X disorders. Current methods for detecting pathogenic STR expansions involve PCR amplification and electrophoresis, and a specific assay must be designed for each locus. These methods do not scale to the whole genome and so cannot be used to identify new pathogenic STR loci.

As exome and whole genome sequencing is becoming increasingly common in the search for the genetic causes of human disease, we need methods capable of detecting pathogenic STR expansions at the genome-wide scale. There are a number of tools for genotyping STR variation in such data – most notably LobSTR and RepeatSeq – however these tools can only detect STR variants shorter than the read length. However most pathogenic variants involve a significant increase in length, far exceeding current Illumina read lengths. Tools to genotype STRs longer than the read length still require the variant to be within the insert size, and require tight insert size distributions, which are becoming increasingly rare in recent Illumina protocols.

Here we describe a method to detect pathogenic STR expansions at all known STR loci in the genome from next generation sequencing data. This method uses decoy sequences added to the reference genome to identify reads originating from a large STR expansion, then identifies their source using paired information. We have validated this method on simulated exome data and applied it to assess the STR variation in an autism cohort.

#LorneGenome