Structural Variant Detection in Individuals and Populations with the PacBio Sequel System — ASN Events

Structural Variant Detection in Individuals and Populations with the PacBio Sequel System (#146)

Luke Hickey 1 , Aaron Wenger 1 , Jason Chin 1 , Jonas Korlach 1
  1. PacBio, Menlo Park, CA, USA

The past quarter century has brought amazing progress in the technology for detecting genetic variants, but intermediate-sized structural variants (50 bp to 50 kb) have remained a challenge. Such variants are too small to detect with cytogenetic methods, but too large to reliably discover with short-read DNA sequencing. Recent de novo assemblies of human genomes have demonstrated the power of PacBio Single Molecule, Real-Time (SMRT) Sequencing to fill this technology gap and sensitively identify structural variants.

 

While de novo assembly is the ideal method to identify variants, it requires a high depth of coverage. An alternative approach, low-fold coverage re-sequencing, has been applied successfully in many studies of single nucleotide variants. Here we show that the same approach works for structural variants.

 

To evaluate low-fold coverage re-sequencing in an individual, the human sample NA12878 was sequenced to 10-fold coverage on the PacBio Sequel System. Reads were mapped to the reference genome with NGM-LR and structural variants were called with PBHoney.

 

The structural variant call set derived from 10-fold coverage sequencing of NA12878 recalls the vast majority of the Genome in a Bottle truth set, and furthermore identifies thousands of novel variants. Additionally, this approach detects the vast majority of the structural variants found by de novo assembly, and demonstrates the power of low-fold coverage sequencing of an individual.

 

Looking beyond the individual to population-scale studies of common structural variants, we present a model of discovery power as a function of variant frequency, cohort size, and depth of sequencing. The model supports the study design pioneered by the 1000 Genomes Project: 5- to 10-fold coverage per genome across a large cohort.

 

In summary, low-fold coverage PacBio Sequencing offers an affordable and effective approach to study the extensive structural variation in individuals and populations.

  1. Chaisson MJ, et al. (2015) Nature, 517(7536):608-11.
  2. Shi L, et al. (2016) Nat Commun, 7:12065.
  3. English AC, et al. (2014) BMC Bioinformatics, 15:180.
  4. Sudmant PH, et al. (2015), 526(7571):75-81.
  5. Parikh H, et al. (2016) BMC Genomics, 17:64.6.
  6. Chin CS, et al. (2016) Nature Methods, accepted.
#LorneGenome