[HTML][HTML] ProTECT—prediction of T-cell epitopes for cancer therapy

AA Rao, AA Madejska, J Pfeil, B Paten… - Frontiers in …, 2020 - frontiersin.org
AA Rao, AA Madejska, J Pfeil, B Paten, SR Salama, D Haussler
Frontiers in immunology, 2020frontiersin.org
Somatic mutations in cancers affecting protein coding genes can give rise to potentially
therapeutic neoepitopes. These neoepitopes can guide Adoptive Cell Therapies and
Peptide-and RNA-based Neoepitope Vaccines to selectively target tumor cells using
autologous patient cytotoxic T-cells. Currently, researchers have to independently align their
data, call somatic mutations and haplotype the patient's HLA to use existing neoepitope
prediction tools. We present ProTECT, a fully automated, reproducible, scalable, and …
Somatic mutations in cancers affecting protein coding genes can give rise to potentially therapeutic neoepitopes. These neoepitopes can guide Adoptive Cell Therapies and Peptide- and RNA-based Neoepitope Vaccines to selectively target tumor cells using autologous patient cytotoxic T-cells. Currently, researchers have to independently align their data, call somatic mutations and haplotype the patient’s HLA to use existing neoepitope prediction tools. We present ProTECT, a fully automated, reproducible, scalable, and efficient end-to-end analysis pipeline to identify and rank therapeutically relevant tumor neoepitopes in terms of potential immunogenicity starting directly from raw patient sequencing data, or from pre-processed data. The ProTECT pipeline encompasses alignment, HLA haplotyping, mutation calling (single nucleotide variants, short insertions and deletions, and gene fusions), peptide:MHC binding prediction, and ranking of final candidates. We demonstrate the scalability, efficiency, and utility of ProTECT on 326 samples from the TCGA Prostate Adenocarcinoma cohort, identifying recurrent potential neoepitopes from TMPRSS2-ERG fusions, and from SNVs in SPOP. We also compare ProTECT with results from published tools. ProTECT can be run on a standalone computer, a local cluster, or on a compute cloud using a Mesos backend. ProTECT is highly scalable and can process TCGA data in under 30 min per sample (on average) when run in large batches. ProTECT is freely available at https://www.github.com/BD2KGenomics/protect.
Frontiers