Highly Accurate Variant Detection
Optimized to provide the maximum number of biomedically relevant, high-value variants.
- Clinical Coding Regions
While accurate sequencing is necessary, it is not sufficient to deliver accurate identification of variants. Advanced tools for alignment, an accurate reference genome, sophisticated algorithms to discern different variant types, and identification of systematic biases inherent in sequencing technologies must all be integrated to deliver variant calls representative of the true nature of the sample.
FIGURE 1: ACE Exome has higher sensitivity with roughly half the errors of standard exome.
Personalis Experience and Approach
The Personalis team has published extensively on issues of DNA sequencing accuracy (Lam et. al., 2011; Dewey et. al., 2011; Clark et. al., 2011). We find that current methods of sequencing, alignment, and variant calling, when used with the standard human reference, can miss numerous medically important variants. The same genome, sequenced on two different platforms, can give substantially different results. Even on the same platform, raw results from the same genome sequenced twice can differ dramatically. Filtering approaches can mask some of these differences, but often by trading false positives for false negatives. Our relentless focus on solving these accuracy issues has led to the development of our alignment and variant calling approach. Leveraging our experience at Personalis and our earlier involvement in other large genomics projects (e.g., 1,000 Genomes, ENCODE, Genome Reference Consortium, etc.), we harness both proprietary methods and established algorithms from genome centers around the world.
Advanced Human Reference Sequence
The ACE Platform utilizes an advanced human reference sequence, which addresses numerous issues found in the public reference. The current public reference (GRCh37) contains a minor allele at over a million positions, thousands of which are associated with disease (e.g., Factor V Leiden, rs6025). The distinction of major vs. minor allele also differs between ethnicities, sometimes dramatically. This causes several issues, including:
- Individuals homozygous for minor alleles present in the reference sequence are not reported as variant at these loci. In an average genome, over 50,000 such variants remain unrecognized in this way. Considering the potential functional influence of homozygous minor alleles, this is a significant oversight.
- Individuals homozygous for major alleles absent in the reference are reported as variant at such loci. In an average genome, about half of all homozygous SNP’s reported (700,000) are actually the major allele. Thus, the ratio of reported homozygous to heterozygous SNPs in a genome is almost double what it would be if the reference was more representative.
- Variant calls can be ethnically biased. To address these issues, Personalis has developed a set of enhanced human reference sequences in which the major allele has been substituted at every position. Distinct versions have been created for three major ethnicities, and we have also created a version specifically for use with admixed genomes. For most new projects, we recommend variant calling using these enhanced human reference sequences.
Additional Features of the ACE Platform
We have quantitatively assessed our pipeline’s performance using “golden” genomes, developed in-house using multiple next-gen sequencing and genotyping technologies.
- Integration of multiple orthogonal algorithms for variant calling
- Proprietary methods for structural variant calling and genotyping In-house structural variant (SV) breakpoint database for accurate SV calling and annotation of SV population frequencies
- In-house database of problematic regions in the human genome
- Genetic analyses, such as detailed blood typing, for sample chain-of-custody crosschecking
- Visual QC summaries that enable quick identification of potential sequencing, alignment or variant calling problems over large numbers of samples
ACE Platform framework is based on a MapReduce approach and runs in a highly parallel computational environment, making it efficient and scalable. This permits rapid turnaround and a focus on quality and accuracy of results.