posted on 2024-06-12, 16:20authored byMadison Darmofal, Shalabh Suman, Gurnit Atwal, Michael Toomey, Jie-Fu Chen, Jason C. Chang, Efsevia Vakiani, Anna M. Varghese, Anoop Balakrishnan Rema, Aijazuddin Syed, Nikolaus Schultz, Michael F. Berger, Quaid Morris
For each broad category of features present within our training set, we trained individual models using the same training regime as GDD-ENS. Results are shown across these categories, represented by a circle. We then iteratively combined and retrained models, adding these broad feature groups in decreasing order of the accuracy of their individual model, represented by an X. The X corresponding to the category indicated on the X-axis corresponds to the accuracy of the model trained on all categories to the left of it. The model trained on all the features has the highest accuracy overall on held-out data. CN; Copy Number.
Funding
National Cancer Institute (NCI)
United States Department of Health and Human Services
Tumor type guides clinical treatment decisions in cancer, but histology-based diagnosis remains challenging. Genomic alterations are highly diagnostic of tumor type, and tumor-type classifiers trained on genomic features have been explored, but the most accurate methods are not clinically feasible, relying on features derived from whole-genome sequencing (WGS), or predicting across limited cancer types. We use genomic features from a data set of 39,787 solid tumors sequenced using a clinically targeted cancer gene panel to develop Genome-Derived-Diagnosis Ensemble (GDD-ENS): a hyperparameter ensemble for classifying tumor type using deep neural networks. GDD-ENS achieves 93% accuracy for high-confidence predictions across 38 cancer types, rivaling the performance of WGS-based methods. GDD-ENS can also guide diagnoses of rare type and cancers of unknown primary and incorporate patient-specific clinical information for improved predictions. Overall, integrating GDD-ENS into prospective clinical sequencing workflows could provide clinically relevant tumor-type predictions to guide treatment decisions in real time.
We describe a highly accurate tumor-type prediction model, designed specifically for clinical implementation. Our model relies only on widely used cancer gene panel sequencing data, predicts across 38 distinct cancer types, and supports integration of patient-specific nongenomic information for enhanced decision support in challenging diagnostic situations.See related commentary by Garg, p. 906.This article is featured in Selected Articles from This Issue, p. 897