Erin Newburn, MS, PhD
Senior Manager, Field Applications Scientist

Sean Michael Boyle, MS, PhD
Manager, Bioinformatics Applications

Neoantigen discovery: Prediction challenges

Part One of a two part series on the development of Personalis Neoantigen Discovery Report. Part II, Overcoming the Challenges of Neoantigen Prediction will be published in two weeks.

The promise of personalized cancer vaccines

As we kick off 2017, we reflect upon the remarkable advancements being made in immunotherapy research. There has been significant, demonstrated success with checkpoint inhibitors including Keytruda, Opdivo, Yervoy, and others. Progress has also been made in the realm of adoptive cell therapies like CAR-T. Yet, despite so much positive momentum, there are still patients who do not respond or even develop resistance to these treatments.

With the pace of genomics innovation now at full speed, the stage has been set for more personalized immuno-oncology (I-O) therapies using neoantigen cancer vaccines. In fact, at the end of last year, we saw one of the first ever Phase I clinical trials for Neon’s neoantigen-based cancer vaccines in combination with Nivolumab (NEO-PV-01) begin recruitment of melanoma, bladder, and lung cancer patients. With this sensational “rebirth” and revolution in the I-O field comes a greater demand for strong bioinformatics tools to assist in cancer vaccine development.

Unfortunately, computational methods to identify neoantigens and to predict which ones may be optimal for immunotherapy requires consideration of numerous technical factors… and this is not a trivial task. Here we discuss several key challenges to accurate neoantigen identification and the importance of assessing tumor escape mechanisms.

Neoantigen prediction challenges

High data quality: To start, the downstream pipelines used for neoantigen prediction require extremely high quality next-generation sequencing (NGS) data. Conventional assays like whole exome sequencing have genomic regions that are often poorly covered due to difficulties in the chemistry and capture. If these coverage “gaps” harbor a variant that results in an immunogenic epitope, that variant would remain undetected. In order to prevent these missed variants, there are requirements for read depth, sensitivity, and specificity to keep in mind. It is optimal to have a sequencing assay that has been robustly analytically validated to understand performance expectations.

Paired tumor and normal samples: Neoantigen prediction greatly benefits from having both a tumor and matched normal sample available to eliminate germline variants, ensuring true somatic mutation detection.

DNA and RNA sequencing: Importantly, DNA sequencing alone is no longer enough: research has shown that expressed variants are more likely to be immunogenic. Having a comprehensive dataset of both genomic and transcriptomic information helps determine if variants are indeed expressed, and provides confidence by cross-referencing the call sets. Access to RNA empowers us to interrogate expression at both the gene-level and variant-level.

HLA-typing: An understanding of a patient’s HLA haplotype is necessary to define the ability of neopeptides to bind and be presented on MHC. The complexity and diversity of the HLA genes make it quite challenging to correctly align short sequencing reads to the reference. As most bioinformatics scientists will tell you, HLA is tricky!

Determining the correct peptides: There are many classes of somatic mutations including SNVs, InDels, and fusions, each producing different protein products. While SNVs result in single amino acid changes, InDels and fusions can create frame-shift or out-of-frame protein products. As each gene can have many different transcripts, single somatic mutation can result in a potentially very large number of protein products. For MHC class I, each of these proteins will generally be broken down into 8-11 amino acid peptides, each of which will then have a chance to be presented on the MHC for immune recognition. Additionally, phasing of both somatic mutation and private germline mutations is critically important. Together, proper transcript selection, which can be achieved by evaluating the tumor RNA, and phasing, should result in increased specificity and improved peptide processing.

MHC binding prediction and immunogenicity: Determining the correct peptides from somatic variants and typing a patient’s HLA region are only the first steps in a very complex process. Each distinct peptide sequence has the potential to be processed by the proteasome, transported for MHC loading, bound for presentation, ultimately recognized by the immune system. Many peptides do not make it through this process. For example, depending on a patient’s HLA, only certain mutations will be able to be correctly processed and loaded into the MHC. While binding preferences for some of the HLA-A alleles are well understood, there are many which are not. The A gene is only one of three class I genes, the others of which are still poorly defined due to limited binding data. Class II binding preferences are even less understood. The good news is that with improvements in detection technology we are working hard to not only improve known binding motifs for these challenging genes, but also working to improve peptide processing and building better algorithms for binding prediction.

While the MHC presents bound peptides, the immune system needs to be able to recognize them, ultimately leading to an immune response. Better TCR/BCR typing as well as building a base of knowledge for which peptides these TCR and BCR sequences are able to detect will greatly improve the immuno-oncology field. However, while these features are being explored, we do have enough understanding of TCR recognition to predict immunogenicity for MCH class I presentation, which has been shown to improve neoantigen selection.

Selecting the correct approach: There are a variety of tools in the bioinformatics community which can perform tasks like in silico HLA-typing and determining MHC binding affinity. However, it is critically important to determine which are the most accurate and areas where improvements can be made through development of new approaches. Also critical is thoroughly assessing that the selected/developed tools are highly accurate when assembled into an analysis pipeline built on the framework of a comprehensive assay platform, accurate annotation system, and well-designed sample processing/tracking system.

There are also amazing databases to take advantage of including IEDB,, and information on Class I immunogenicity signatures. Are these putative neo-peptides something the body has seen before? It’s crucial to employ knowledge from accessible sources like these to determine similarity to known antigens, similarity to self-antigens, and other critical parameters.

Ranking neoantigens: After identifying a pool of neoantigens, what criteria are most useful in prioritizing these candidates? In a tumor sample, 10’s to 1000’s of non-synonymous mutations might be found. Each mutation can generate many peptide candidates resulting in 100’s to 10,000’s of candidate neoantigens per tumor. Binding affinity, variant allele frequency, immunogenicity, gene expression and several other considerations should be at the forefront to rank the list.

Tumor microenvironment: Tumors do not exist in a vacuum or in a well-controlled lab environment. They grow and expand in a complex cellular environment surrounded by healthy tissue and the immune system, which attempts to infiltrate and attack growing tumors. Tumors which have increased infiltration of certain immune cell types are more likely to respond positively to therapy. Immune cell types can be detected and classified from within a tumor through expression signatures. However, identification of the ideal gene signature for each cell type and how best to apply that signature to assess immune infiltration is still contested. Development of improved gene sets and machine learning approaches will allow for better prediction of therapeutic response through for more accurate classification and quantification.

Tumor escape: Just as the immune system is attempting to recognize and eliminate tumors, tumor have ways to avoid detection. Tumors have many mechanisms for immune evasion, including antigen presenting machinery mutations or release of immunosuppressive modulators. Identification of these mechanisms in patients would provide a more complete tumor characterization and assessment of potential therapeutic responses.