Duplex sequencing

Duplex sequencing overview: Duplex tagged libraries containing sequencing adapters are amplified and result in two types of products each originates from a single strand of DNA. After sequencing the PCR products, the generated reads divide into tag families based on the genomic position, duplex tags, and the neighboring sequencing adapter. Sequence tag α is the reverse complement of sequence tag β and vice versa.

Duplex sequencing is a library preparation and analysis method for next-generation sequencing (NGS) platforms that employs random tagging of double-stranded DNA to detect mutations with higher accuracy and lower error rates.

This method uses degenerate molecular tags in addition to sequencing adapters to recognize reads originating from each strand of DNA. As the two strands are complementary, true mutations are found at the same position in both strands. In contrast, PCR or sequencing errors result in mutations in only one strand and can thus be discounted as technical error. Duplex sequencing theoretically can detect mutations with frequencies as low as 5 x 10−8 --that is more than 10,000 times higher in accuracy compared to the conventional next-generation sequencing methods.[1][2]

The estimated error rate of standard next-generation sequencing platforms is 10−2 to 10−3 per base call. With this error rate, billions of base calls that are produced by NGS will result in millions of errors. The errors are introduced during sample preparation and sequencing such as polymerase chain reaction, sequencing, and image analysis errors. While the NGS platforms' error rate is acceptable in some applications such as detection of clonal variants, it is a major limitation for applications that require higher accuracy for detection of low-frequency variants such as detection of intra-organismal mosaicism, subclonal variants in genetically heterogeneous cancers, or circulating tumor DNA.[3][4][5]

Several library preparation strategies have been developed that increase accuracy of NGS platforms such as molecular barcoding and circular consensus sequencing method.[6][7][8][9] Like NGS platforms, the data generated by these methods originates from a single strand of DNA, and therefore the errors that are introduced during PCR amplification, tissue processing, DNA extraction, hybridization capture (where used) or DNA sequencing itself can still be distinguished as a true variant. The duplex sequencing method addresses this problem by taking advantage of the complementary nature of two strands of DNA and confirming only variants that are present in both strands of DNA. Because the probability of two complementary errors arising at the same location in both strands is exceedingly low, duplex sequencing increases the accuracy of sequencing significantly.[1][6][8][10]

  1. ^ a b M. W. Schmitt, S. R. Kennedy, J. J. Salk, et al. “Detection of ultra-rare mutations by next-generation sequencing”. Proc. Natl. Acad. Sci., vol. 109 no. 36. 2012. PMID 22853953.
  2. ^ S. R. Kennedy, M. W. Schmitt, E. J. Fox, B. F. Kohrn, et al. “Detecting ultra low-frequency mutations by Duplex Sequencing”. Nature Protoc., vol. 9 no. 11, 2586-606. 2014. PMID 25299156.
  3. ^ T. E. Druley, F. L. M. Vallania, D. J. Wegner, et al. “Quantification of rare allelic variants from pooled genomic DNA” Nature Methods, vol. 6, no. 4, pp. 263–265, 2009. PMID 19252504.
  4. ^ N. McGranahan and C. Swanton. “Biological and Therapeutic Impact of Intratumor Heterogeneity in Cancer Evolution” Cancer Cell, vol. 27, no. 1, pp. 15–26, 2015. PMID 25584892.
  5. ^ C Bettegowda, M Sausen, RJ Leary, et al. “Detection of Circulating Tumor DNA in Early- and Late-Stage Human Malignancies”. Sci Transl Med, vol. 6, no. 224, p. 224ra24, 2014. PMID 24553385.
  6. ^ a b B. E. Miner, R. J. Stöger, A. F. Burden, et al. “Molecular barcodes detect redundancy and contamination in hairpin-bisulfite PCR”[dead link]. Nucleic Acids Res, vol. 32, no. 17, p. e135, 2004. PMID 15459281.
  7. ^ M. L. McCloskey, R. Stoger, R. S. Hansen, et al.“Encoding PCR products with batch-stamps and barcodes”, Biochem. Genet., vol. 45, no. 11–12, pp. 761–767, 2007. PMID 17955361.
  8. ^ a b D. I. Lou, J. A. Hussmann, R. M. Mcbee, et al. “High-throughput DNA sequencing errors are reduced by orders of magnitude using circle sequencing”. Proc Natl Acad Sci U S A, vol. 110 no. 49, 19872–19877, 2013. PMID 24243955.
  9. ^ A. Y. Maslov, W. Quispe-Tintaya, T. Gorbacheva, R. R. White, and J. Vijg, “High-throughput sequencing in mutation detection: A new generation of genotoxicity tests?”, Mutat. Res., vol. 776, pp. 136–43, 2015. PMID 25934519.
  10. ^ E. J. Fox, K. S. Reid-Bayliss, M. J. Emond, et al. “Accuracy of Next Generation Sequencing Platforms”. Next Gener Seq Appl., pp. 1–9, 2015. PMID 25699289.