Erin Newburn, MS, PhD
Senior Manager, Field Applications Scientist

Refining neoantigen load

Tumor mutational burden (TMB) has now been established as a positive predictive biomarker of response to immune checkpoint inhibitors with more studies demonstrating correlations between the overall mutational landscape and clinical benefit (Snyder et al. 2014, Van Allen et al. 2015). In the past few years, investigators have also began to further interrogate a subset of those mutations within the coding region thought to be more immunogenic, or what is referred to as the neoantigen load of the tumor. Like TMB, neoantigen load has also been used in the same fashion with correlative studies revealing prognostic potential (Rizvi et al. 2015, Shukla et al. 2016).  As we continue to decide which cancer patients should receive these powerful immunotherapies, it’s likely that a multiplexed biomarker signature will provide improved utility compared to a single biomarker on its own (Blank et al. 2016). Yet the underlying metrics used within a multi-pronged approach, such as TMB and neoantigen load, must each be accurate in their own right.

Within current neoantigen load predictions, most analyses to date have focused only on somatic point mutations or single nucleotide variants (SNVs). Yet, insertion and deletions (indels) could prove to be a rich source of neoantigens, specifically frameshift indels that produce a novel open reading frame. Turajlic et al. 2017 recently highlighted the need for further characterization of the impact of indel mutations in anticancer immunity. With a robust analysis of almost 6,000 solid tumor patient samples from 19 different cancers, they demonstrated that renal cell carcinoma patients have a higher proportion of indels compared to other cancer types within the TCGA (The Cancer Genome Atlas) cohort. Interestingly, when they looked at HLA binding affinities of the frameshift mutations across samples, frameshift indels resulted in 3x the number of high affinity binders (IC50<50) than non-synonymous single nucleotide variants. Moreover, they also assessed the association of indels with clinical response to checkpoint inhibitors, and detected a significant association of indels with checkpoint inhibitor response in melanoma patients.

By the inclusion of indels in tumor mutational burden and neoantigen load, could these metrics become more potent predictors of immunotherapy response? The answer may be found within the growing body of translational data being examined across ongoing clinical trials as investigators strive to predict likelihood of clinical benefit. As we begin to further consider the variant types that must be encompassed within the mutational landscape, we must first step back and consider the initial challenges of the bioinformatics analyses at hand.

The Critical First Step: Accurate Somatic Variant Calling of SNVs and Indels

Somatic variant calling forms the bioinformatic foundation and core analysis used in the downstream tumor neoantigen prediction. Accordingly, accurately detecting somatic variants is essential for personalized genomic approaches. Performance comparisons of the publically-released bioinformatics tools have revealed a general disagreement between somatic variant callers (Xu et al. 2014Kroigard et al. 2016). Often there is a trade-off: there may be a tool achieving high sensitivity, but that same tool may also result in high false positive rates. Additionally, the inherently complex nature of tumor samples makes somatic calling challenging. These samples are very heterogenous, consisting of a mix of non-tumoral tissue (i.e. samples high in stromal content) and cancerous cells. To add to this complexity, tumor samples harbor clonal and subclonal populations of malignant cells as well.

As suggested above, accurate assessment of not just somatic SNVs but also indels should be made for input into neoantigen predictions. Accurate indels are heavily dependent upon accurate alignment. Also, examining somatic variants, both SNVs and indels, within a range of allelic fractions (AF) is also precarious – especially the delicate task of accurately detecting low allele variants such as <10% or <5% AF. Yet from a biological perspective, we know these may be critical variant calls, as they can be useful for understanding minimum residual disease, mechanisms of resistance, and tumor evolution/escape.

High Sensitivity and Sensitivity for all Variant Types

At Personalis, we recently performed a rigorous analytical validation of small variants, both SNVs and indels, to evaluate somatic variant calling in the augmented ACE Cancer Exome for neoantigen detection. Samples were sequenced at an average depth of 200x coverage on a 69.4 Mb genomic footprint assay, using tumor and matched normal samples. Both positive percent agreement and the positive predictive value were calculated over a range of minor allele frequencies (MAF). To overcome the obstacle of “Gold Standard” samples, well-characterized cancer cell lines were procured to generate reference materials to represent variant types at a range of frequencies.  These cancer cell lines also had the corresponding matched normal cell lines available, and the sample pairs were run through the Personalis somatic analysis pipeline. The resulting variants were compared to a distinct set of GOLD standards. The GOLD SET of variants were defined as variants identified by the COSMIC Cell Line Project (release November 25, 2015) and identified by two other independent sources: 1) the Cancer Cell Line Encyclopedia (CCLE) and NCI60 or 2) confirmed by Sanger Sequencing. To further investigate limits of detection of these small variants, dilutions of cell line pairs (i.e. tumor + normal) were mixed to simulate 5 to 80% tumor purity. For example, taking 5% of the tumor cell line and mixing 95% of the normal cell line to simulate 5% tumor purity.  From this robust analysis, overall analytical sensitivity for SNPs and indels was 98% and 95%, respectfully. Further, at a 10% MAF, sensitivity for SNVs and indels was 97% and 87%, respectfully, with 98% specificity. This augmented exome approach demonstrates greater sensitivity and specificity for somatic variant calling for both SNV and indel than other conventional exome methodologies.

Clearly, inaccurate somatic variant calling can lead to limitations in the downstream bioinformatics and hinders accurate prediction of putative neoepitopes. The quality of the somatic calls introduced into subsequent analytics, from epitope prediction, in silico HLA typing, and MHC binding prediction can all be impacted by this critical first step. Further, as recently witnessed in the literature (Turajlic et al. 2017), the future of neoantigen predictions may be more precise by ensuring all variant types, both SNVs and indels, are accounted for within this metric.