Leading the Way in Life Science Technologies

GEN Exclusives

More »

GEN Exclusives

More »
May 12, 2016

Sequencing the Snark

Comparison of Whole-Genome Sequencing Strategies for the Detection of Somatic Mutations in Cancer

Sequencing the Snark

The Hunting of the Snark (An Agony, in Eight Fits) by Lewis Carrol [Wikicommons / Henry Holiday's cover of the first editon of The Hunting of the Snark]

  • The diagnostic work up of most malignant tumors is not complete unless the work up includes genomic characterization. At the time of diagnosis this may consist of a panel of prognostic and/or predictive analyses performed using cytogenetics, fluorescence in situ hybridization, standard molecular technologies, and next-generation sequencing (NGS). Whereas genomic interrogation of early-stage tumors using NGS is typically in the form of a multigene panel designed for the particular malignancy, the genomic analysis of advanced-stage cancers may be based on aggressive examination using a large panel of genes, only some of which encode targets for molecularly directed therapies.

    Based on the significant advances in NGS technologies and the decreasing costs, the technocrati have advocated increasingly broad sequencing panels to characterize the mutational landscape of recurrent tumors and rare types of malignancies, as illustrated by the NCI-Molecular Analysis for Therapy Choice (MATCH) trial. The seduction of comprehensive genomic analysis has led to enthusiasm for performing whole exome and whole genome sequencing in oncology, as promoted by NantHealth. Whole genome sequencing (WGS) also is coming into its own as a front-line diagnosis tool for neurogenetic disorders, as convincingly discussed in another article in Clinical OMICs (http://www.clinicalomics.com/articles/exome-sequencing-beginning-to-displace-common-genetic-tests-in-clinic/603).

    While NGS technology for high-throughput sequencing is advancing rapidly, the bioinformatics support required for precision interpretation has lagged behind. Potential challenges in data interpretation are illustrated in two recent publications, one focusing on the assessment of somatic mutations in oncology specimens analyzed by WGS, and the second on disparities in results of “recreational genetic studies” in the direct-to-consumer market.

    Alioto et al., compared the WGS findings from a medulloblastoma cancer sample sequenced by eight laboratories in the International Cancer Genome Consortium, all using the same sequencing platform HiSeq (Alioto TS, et. al. Nature Communications 2015; 6:10001). The initial observation was that the quality of the libraries prepared using different protocols (with or without PCR amplification and with varied reagent suppliers) had a significant impact on average coverage depth and evenness of coverage. Two of the eight sites did not meet the minimal requirement (30X) for average coverage depth, and their data were excluded. Even among the remaining six sites, the percentage of the genome sequenced with at least 25X coverage depth ranged from approximately 75% to less than 50%. This could significantly limit the identification of mutations in the latter case.

    To compare bioinformatic approaches, a gold standard dataset was distributed to the Consortium, and 18 laboratories submitted analyses calling somatic single-base mutations (SSMs), and 16 laboratories submitted results for somatic insertion/deletion mutations (SIMs). Less than a quarter of the total SSMs and only 1 of 347 SIMs in the gold standard benchmark results were called by all laboratories.

    Four patterns of data analysis were observed:

    1. balanced distributions of mutations across chromosomes with predominance of true positives and few false positives;
    2. balanced distribution with many false negatives;
    3. clusters of false positives near centromeres; and
    4. high false positive rate with clustering of mutations.

    It was concluded that alignment/mapping and primary mutation calling tools also have a significant impact on the accuracy of identification of mutations. The relative content of tumor cells in the dataset was also found to have a significant effect on mutation detection. At tumor content of 83% and 50%, mutation calls were 95% and 85% of the benchmark values, respectively, at 100X depth of sequencing. The detection rates fell to 92% and 68%, respectively, at 30X depth.

    One take-home message from this work is that WGS is a difficult, error-prone technology, even when carried out by highly experienced laboratories. This study provides valuable insights into critical factors for developing whole-genome or whole-exome sequencing workflows that meet the standards for clinical laboratories under the guidelines of CLIA and College of American Pathologists.

    The authors made the following recommendations:

    1. library preparation without PCR amplification,
    2. depth of coverage for tumor specimens greater than 100X,
    3. germline control coverage depth similar to that for tumor,
    4. optimization of the combination of aligner and variant caller software,
    5. use of multiple mutation callers,
    6. allowance for mutations in proximity to repeat regions, and
    7. filtering by sequence quality, including mapping quality and strand bias. In addition, the use of a reference genome as a control was recommended.

    By comparison, the findings on the quality of “recreational genetic studies” from direct-to-consumer providers are more provocative (Corpas M, et. al. BMC Genomics 2015;16:910). Disparate results were obtained from four whole-exome analyses of a Spanish family, including a mother, father, daughter, and son. Each of the four companies reported from four to nine significant variants that were present in at least one member of the family. However, there was no overlap among the top results from the four companies. Three of the companies reported multiple variants that were present in three or four family members, while one company reported only variants that were each present in a single family member. Two analyses detected different variants in the son that were not carried by either parent, raising the possibility that a mistaken genome-analysis report could be misinterpreted as suggesting non-paternity.

     The one success of the analysis was the finding of a genetic explanation for the distribution of red hair in the family. The comparison, however, reveals that the results of whole-exome analysis by direct-to-consumer providers may differ significantly, which reflects the lack of tools for standardization of sequencing and data analysis and consensus for reportable findings. While this study was crowd-funded and the son is a bioinformatician, it raises significant questions regarding the accuracy of this class of “recreational genetics,” and the current value of such studies in the trajectory of innovative medical strategies.

Related content