![]() To determine the lowest frequency at which a true somatic mutation can be distinguished from a sequencing error and to determine site-specific sequencing error rates, we performed a dilution experiment using a matched cancer/normal cell line COLO829/COLO829BL (ATCC CRL-1974 and ATCC CRL-1980), both of which were established from the same patient: COLO829 was from malignant melanoma and COLO829BL was from the matching normal lymphoblastoid. ![]() Jude), HudsonAlpha Institute of Biotechnology (HAIB), and WuXiNextCode and whole-exome sequencing datasets generated by Broad Institute (BI) and Baylor College of Medicine (BCM) on five different Illumina sequencing platforms (Additional file 1: Table S1). ![]() In this study, we systematically investigated substitution error profiles by analyzing multiple sequencing datasets from five DNA sequencing providers: three deep sequencing datasets generated by St. These results provide important insights for future improvements of sequencing accuracy. We next analyzed distinct error profiles that can be attributed to different steps of NGS workflows, including sample handling, polymerase errors, and PCR enrichment steps. We first explored error profiles by performing a paired cancer-normal dilution experiment followed by deep sequencing and discovered that the substitution error rate can be suppressed computationally to 10 −5 to 10 −4, which is 10- to 100-fold lower than the current reports. We focused on substitution variants because they are the most abundant mutation type in both adult (97%) and pediatric cancers (93%). In this study, we performed a comprehensive analysis of the substitution errors in deep sequencing data using the conventional NGS technology. With the rapid progress in sequencing technology and dramatic reductions in sequencing cost, there is a great need to systematically evaluate sequencing errors at various steps of a conventional NGS workflow, as this knowledge will help improve low-level variant detection by deep sequencing. For example, the FDA-authorized MSKCC-IMPACT study reported a detection limit of 0.02 mutant allele fraction (MAF) for hotspot mutations and 0.05 for non-hotspot mutations at a read-depth of 500–1000X. This presumed high error rate (> 0.1%) constrains further exploration of ways to improve sensitivity of low-frequency variant detection. The substitution error rate by conventional NGS was first reported to be > 0.1% in 2011 and was similar in later reports and in a recent review. Įrrors acquired during next-generation sequencing (NGS) are key confounding factors of sensitive detection of low-frequency variants by deep sequencing. Typical applications include detecting subclonal pathogenic mutations in driver genes such as NRAS/ KRAS in leukemias that frequently seed relapse, mosaic cancer predisposition mutations, age-related clonal hematopoiesis that increases cancer risk, and liquid biopsy for non-invasive diagnosis and disease monitoring. The error profiles revealed by our study highlight new directions for further improving NGS analysis accuracy both experimentally and computationally, ultimately enhancing the precision of deep sequencing.ĭetecting somatic mutations present at a low frequency through deep sequencing is important for cancer genomic profiling. We present the first comprehensive analysis of sequencing error sources in conventional NGS workflows. ![]() We also find that more than 70% of hotspot variants can be detected at 0.1 ~ 0.01% frequency with the current NGS technology by applying in silico error suppression. Furthermore, C>T/G>A errors exhibit strong sequence context dependency, sample-specific effects dominate elevated C>A/G>T errors, and target-enrichment PCR led to ~ 6-fold increase of overall error rate. We find that error rates differ by nucleotide substitution types, ranging from 10 −5 for A>C/T>G, C>A/G>T, and C>G/G>C changes to 10 −4 for A>G/T>C changes. We then quantify substitution errors attributable to sample handling, library preparation, enrichment PCR, and sequencing by using multiple deep sequencing datasets. Resultsīy evaluating read-specific error distributions, we discover that the substitution error rate can be computationally suppressed to 10 −5 to 10 −4, which is 10- to 100-fold lower than generally considered achievable (10 −3) in the current literature. In this study, we use current NGS technology to systematically investigate these questions. However, there is a lack of comprehensive understanding of errors introduced at various steps of a conventional NGS workflow, such as sample handling, library preparation, PCR enrichment, and sequencing. Sequencing errors are key confounding factors for detecting low-frequency genetic variants that are important for cancer molecular diagnosis, treatment, and surveillance using deep next-generation sequencing (NGS).
0 Comments
Leave a Reply. |