Precision Neoantigen Discovery Using Large-scale Immunopeptidomes and Composite Modeling of MHC Peptide Presentation

Immunopeptidomics Special Issue of the journal Molecular & Cellular Proteomics

January 2021

Volume 20, 100111

The Personalis authors created Systematic HLA Epitope Ranking Pan Algorithm (SHERPA™), a novel pan-allelic machine learning algorithm for predicting MHC-peptide binding and presentation that demonstrates significantly improved performance compared to currently available prediction tools. To improve performance and generalizability, SHERPA was trained with immunopeptidomics data from newly engineered cell lines mono-allelic for HLA combined with other publicly available datasets. In addition, SHERPA was designed to more comprehensively capture epitope binding and presentation features to further enhance the predictive power of the algorithm. Using a composite model constructed with gradient boosting decision trees, multi-allelic deconvolution, and 2.15 million peptides encompassing 167 unique human HLA alleles, SHERPA achieved a 1.44-fold improvement of positive predictive value compared to existing tools when evaluated on independent mono-allelic datasets. Since publication, Personalis has further expanded the in-house generated immunopeptidomics training data set to a total of ~70 mono-allelic cell lines, resulting in a new version of SHERPA with further enhanced performance.