The DEEP Neural Network Architecture and workflow of the DEEP score calculation for simple somatic mutations in prostate cancer. Layer1: convolution layer, kernels:320, window size:8, step size:1 Layer2: convolution layer, kernels:320, window size:8, step size:1 Layer3: pooling layer, kernel size: 4; step size:4; Layer4: convolution layer, kernels:480, window size:8, step size:1 Layer5: convolution layer, kernels:480, window size:8, step size:1 Layer6: pooling layer, kernel size: 4; step size:4; Layer7: convolution layer, kernels:640, window size:8, step size:1 Layer8: convolution layer, kernels:640, window size:8, step size:1 Layer9,10: fully connected layers; Layer11: Sigmoid output layer; All convolution layers were ReLU activated. 20% dropout were performed before the pooling layers. The weight (W) is a function that decays with the increased distance between mutations and the 200bp windows. We used k=5 as a magnifier. Finally, absolute values of the summed scores from two strands were taken to derive the DEEP scores.
ARTICLE ABSTRACT
Our understanding of noncoding mutations in cancer genomes has been derived primarily from mutational recurrence analysis by aggregating clinical samples on a large scale. These cohort-based approaches cannot directly identify individual pathogenic noncoding mutations from personal cancer genomes. Therefore, although most somatic mutations are localized in the noncoding cancer genome, their effects on driving tumorigenesis and progression have not been systematically explored and noncoding somatic alleles have not been leveraged in current clinical practice to guide personalized screening, diagnosis, and treatment. Here, we present a deep learning framework to capture pathogenic noncoding mutations in personal cancer genomes, which perturb gene regulation by altering chromatin architecture. We deployed the system specifically for localized prostate cancer by integrating large-scale prostate cancer genomes and the prostate-specific epigenome. We exhaustively evaluated somatic mutations in each patient's genome and agnostically identified thousands of somatic alleles altering the prostate epigenome. Functional genomic analyses subsequently demonstrated that affected genes displayed differential expression in prostate tumor samples, were vulnerable to expression alterations, and were convergent onto androgen receptor–mediated signaling pathways. Accumulation of pathogenic regulatory mutations in these affected genes was predictive of clinical observations, suggesting potential clinical utility of this approach. Overall, the deep learning framework has significantly expanded our view of somatic mutations in the vast noncoding genome, uncovered novel genes in localized prostate cancer, and will foster the development of personalized screening and therapeutic strategies for prostate cancer.
This study's characterization of the noncoding genome in prostate cancer reveals mutational signatures predictive of clinical observations, which may serve as a powerful prognostic tool in this disease.