HLA binding is currently the most well-established criteria for ranking neoantigen candidates. Recent advances in training data generated from mass spectrometry provide a larger dataset of peptide binders and non-binders for individual HLA alleles. This new binding data takes two important additional components into consideration: cleavage and transportation, which are critically important for presentation assessment.

We leveraged this advancement by developing our Systematic HLA Epitope Ranking Pan Algorithm (SHERPA), our pan-predictive machine learning model for predicting MHC class I presentation.

SHERPA relies upon a proprietary, high quality, and unambiguous training dataset generated by performing immunopeptidomics on a robust set of MHC Class I alleles using monoallelic cell lines (Figure 5).

Figure 5: Overview of SHERPA machine learning algorithm

Multiple modeling strategies were combined to accurately predict neoantigens for all known alleles. The SHERPA-Binding algorithm uses both the peptide and binding pocket information to predict a binding rank. The SHERPA-Presentation algorithm incorporates additional, critical features such as peptide processing properties and gene expression data to predict a more comprehensive presentation rank (Figure 6).

Figure 6: SHERPA Models

The performance of SHERPA was evaluated using 10% of the monoallelic immunopeptidomics data (which had been held-out from the training data-set) mixed with synthetic negative examples in a 1:999 ratio (commonly assumed prevalence). SHERPA models have higher precision over all recall values compared to NetMHCPan-4.0, the most commonly used publicly available tool (Figure 7A), and significantly higher positive predictive values among the top 0.1% peptides in the test data (Figure 7B).

Figure 7A and B: SHERPA Enables Superior Neoantigen Presentation Prediction



In addition to its status as an emerging biomarker of interest in the era of cancer immunotherapy, HLA genotyping is also an essential component of the neoantigen prediction process. Personalis’ HLA typing tool, HLA-Map, has been integrated into the NeoantigenID analytics engine; enabling the highly-accurate in silico typing of all HLA Class I and Class II loci, which is critical for ensuring the precision of downstream peptide-MHC-binding predictions.

To confirm the accuracy of HLA-Map, we performed a comprehensive analytical validation study. This validation study was performed on a total of 15 proficiency testing samples with known, but blinded HLA genotype profiles. Ten of these samples were sourced from the American Society of Histocompatibility and Immunogenetics (ASHI) and five additional samples were obtained from the College of American Pathologists (CAP). Each of these samples had previously been independently genotyped via various orthogonal clinical tests, and these results against which our own results were compared. As is demonstrated in the table below, HLA-Map performed exceptionally well in accurately genotyping not only the HLA Class I loci, but also the more challenging HLA Class II loci.

HLA LociNumber of AllelesNumber of Correct CallsHLAHM Concordance
All Class I9090100.0%
All Class II18017798.3%
All Class I + Class II27026798.9%

Table 1: HLA-Map’s HLA genotyping performance for both HLA Class I and Class II loci.

DASH: Deletion of Allele-Specific HLAs

The success of immune checkpoint blockade (ICB) has revolutionized cancer treatment. However, the fact that the majority of cancer patients do not respond favorably to such immunotherapies has resulted in an explosion in the breadth of research efforts to identify new biomarkers of response and/or resistance to these new class of cancer therapeutics.

Given that the mechanism of action of these therapies is contingent on the dynamic interplay between the tumor and the host’s immune system, the role of the antigen processing machinery (APM) in ensuring that tumor-specific neoantigens are successfully presented to the adaptive immune cells has garnered increasing attention in the search for more effective biomarkers. More specifically, loss of heterozygosity (LOH) impacting the HLA Class I genes has emerged as a means by which solid tumors can evade immunosurveillance by reducing the repertoire of neoantigens that can be presented to the immune system, and this phenomenon is now recognized as a key resistance mechanism to ICB (McGranahan et al., 2017; Tran et al., 2016).

In line with our goal to provide our partners with the most comprehensive cancer immunogenomics platform, Personalis has endeavored to enable the accurate assessment of HLA LOH with NeXT. Through DASH (Deletion of Allele-Specific HLAs), we have created a machine-learning-based tool to capture the unique features associated with each individual HLA Class I region which, when combined with the ACE-augmented sequencing data generated by the NeXT assays, enables us to accurately assess HLA LOH using a novel NGS-based approach.

In order to validate our performance, we assessed the limit of detection (LOD) of DASH using a lymphoblastic cell line paired tumor-normal sample that had HLA LOH in both HLA-B and HLA-C genes. We deeply sequenced the tumor and normal pairs with the NeXT Exome and sub-sampled the tumor sequencing data with the corresponding normal sequencing data to simulate various tumor purity and clonality levels. The sensitivity and specificity of the tool is demonstrated in the heatmaps shown in Figure 7 below. DASH retained 100% sensitivity until the mixture of reads was less than 18% tumor (as observed in several tumor purity and HLA LOH clonality combinations: 20% purity and 100% clonality; 50% purity and 30% clonality; 100% purity and 20% clonality). Additionally, DASH achieved 100% specificity until the read mixture was less than 21% derived from the tumor, highlighting DASH’s accuracy and low LOD.

Additional validation studies utilizing several novel, orthogonal methods are underway and the results of these studies will be shared once available.