A Polycomb Repression Signature in Metastatic Prostate Cancer Predicts Cancer Outcome

The Polycomb Group (PcG) protein EZH2 is a critical component of a multiprotein complex that methylates Lys of histone 3 (H3K27), which consequently leads to the repression of target gene expression. We have previously reported that EZH2 is overexpressed in metastatic prostate cancer and is a marker of aggressive diseases in clinically localized solid tumors. However, the global set of genes directly regulated by PcG in tumors is largely unknown, and thus how PcG mediates tumor progression remains unclear. Herein we mapped genome-wide H3K27 methylation in aggressive, disseminated human prostate cancer tissues. Integrative analysis revealed that a significant subset of these genes are also targets of PcG in embryonic stem cells, and their repression in tumors is associated with poor prognosis. By stepwise cross-validation, we developed a ‘‘Polycomb repression signature’’ composed of 14 direct targets of PcG in metastatic tumors. Notably, solid tumor subtypes in which this gene signature is repressed show poor clinical outcome in multiple microarray data sets of tumors including breast and prostate cancer. Taken together, our results show a fingerprint of PcG-mediated transcriptional repression in metastatic prostate cancer that is reminiscent of stem cells and associated with cancer progression. Therefore, PcG proteins play a central role in the epigenetic silencing of target genes and functionally link stem cells, metastasis, and cancer survival. [Cancer Res 2007;67(22):10657–63]


Introduction
Polycomb group (PcG) proteins are transcriptional repressors with important roles in preserving cellular identity.The PcG proteins, EZH2 (enhancer of zeste 2), SUZ12 (suppressor of zeste 12), and EED (embryonic ectoderm development), form the Polycomb Repressive Complex 2 (PRC2) and specifically trimethylate H3K27 on target gene promoters (1).This histone mark is part of a preprogrammed cellular memory system that is inheritable through mitotic cell divisions and thus preserves cellular identity.PcG proteins have recently been implicated in the maintenance of stem cells (2).Genome-wide location analysis revealed that Polycomb represses a special set of developmental regulators and signaling molecules, thus maintaining the pluripotency of human (3) and murine (4) embryonic stem cells.Dysregulation of PcG proteins may lead to maldifferentiation, a hallmark of cancer.
Prostate cancer is the leading cause of cancer-related death in American men.New prognostic biomarkers are required to enhance risk assessment and individualize medicine for cancer patients because a high percentage of patients incur disease recurrence (5).Histone modification patterns have been found to predict risk of prostate cancer recurrence (6), indicating a role of epigenetic mechanisms in cancer progression.Concordantly, EZH2 and SUZ12 are frequently overexpressed in aggressive tumors including prostate and breast cancers (7).In addition, previous studies have suggested that stem cell Polycomb target genes are predisposed to DNA hypermethylation, and thus repression, in cancer (8), relating Polycomb-mediated transcriptional repression with cancer.PcG target genes in human tumors, however, remain widely undiscovered and, consequently, their roles in cancer development unknown.In this study, we mapped PRC2 target genes in aggressive prostate tumors and investigated their association with cancer outcome.

Experimental Procedures
Cells and human tissues.LNCaP and PC3 cells were cultured in RPMI supplemented with 10% fetal bovine serum (Invitrogen).Prostate cancer tissues were collected from the Rapid Autopsy Program, University of Michigan Prostate Cancer Specialized Program of Research Excellence Tissue Core, with informed consent of the patients and prior institutional review board approval.
Chromatin immunoprecipitation and genome-wide location analysis.Chromatin immunoprecipitation (ChIP) on chip was done using the Agilent proximal promoter arrays according to the manufacturer's protocols (Supplementary Methods).Antibodies (5 Ag) used for ChIP include monoclonal anti-EZH2 (BD), polyclonal anti-SUZ12 (Upstate), and polyclonal anti-H3K27me3 (Upstate) antibodies.
Data sets.All expression microarray data sets were collected from Oncomine (9), 8 and contained solely primary tumors (breast or prostate cancer), except for the Yu et al. data set (10), which also contained benign, adjacent-to-cancer, and metastatic tissues.For evaluation of prognostic power of gene signature, only the primary tumor samples were used.
Statistical analysis.To compare primary to metastatic prostate cancer, gene differential analysis was done by Cyber-T statistic.The enrichment of PRC2-occupied genes in our prostate cancer profiling data was assessed by

Results
Genome-wide mapping of H3K27me3 in metastatic prostate cancer tissues.To investigate the mechanism of PcG proteins in regulating cancer progression, we mapped genomic sites occupied by PRC2 in late-stage, aggressive prostate cancer tissues by combining ChIP with promoter arrays (Fig. 1).By integrating genome-wide location data with cancer expression profiling data, we identified genes directly repressed by PRC2, termed ''Polycomb repression signature'', in cancer.As Polycomb contributes to maintaining the undifferentiated state of stem cells, we hypothesized that its target genes may be important for cancer progression. 9http://www.r-project.org Figure 1.A flowchart to develop a Polycomb repression signature by genome-wide location and expression analysis of aggressive human tumors.A, genomewide location analysis of H3K27 trimethylation (H3K27me3) was done in human metastatic prostate cancer tissues.B, H3K27me3-occupied genes were evaluated for repression in metastasis using prostate cancer expression profiling data sets.C, a Polycomb repression signature was developed from the analysis of metastatic prostate cancer specimens to predict patient survival in multiple cancer microarray data sets as outlined in D .A, network view of the molecular concept analysis of our ''H3K27me3-occupied genes in metastatic prostate cancer'' concept (orange node with black ring ).Each node represents a molecular concept or set of biologically related genes.The node size is proportional to the number of genes in each concept.Each edge represents a statistically significant enrichment (P < 1 Â 10 À5 ; Supplementary Table S1).Enrichments with ''stem cell concepts'' are indicated by orange edges, enrichments with ''cancer survival'' concepts by red edges, and enrichments for genes down-regulated in metastatic prostate cancer concepts by blue edges.B, heatmap of 87 Polycomb repression signature genes that are occupied by H3K27me3 and repressed in metastatic (MET ) relative to clinically localized (PCA ) prostate cancer and benign prostate tissue.Rows represent genes and columns represent samples.C, two main clusters derived from the Yu et al. prostate cancer training data set (13), and the predicted high-risk and low-risk groups for the Glinsky et al. validation data set (14) differed on patients' 10-year metastasis-free survival.Hierarchical clustering of the training data set and k -nearest neighbor classifier prediction of the validation data set were evaluated in the space of 87 Polycomb repression signature genes, thus defining a low-risk group (green line ) and a high-risk group (red line ).The survival difference was assessed by Kaplan-Meier analysis and the P value was reported by log-rank test.Dashed line, median survival time of applicable groups.For each data set the number of primary tumors is indicated.D, two clusters derived from Wang et al. estrogen receptor-positive breast cancer training data set (15) and the predicted high-risk and low-risk groups for the validation Pawitan et al. (16) and Miller et al. (17) breast cancer data sets have significantly different patient outcomes.Green lines, low-risk groups; red lines, high-risk groups.
We thus investigated their association with cancer outcome by Kaplan-Meier survival analysis of multiple cancer microarray data sets as illustrated in Fig. 1.
We carried out genome-wide location analysis of RPC2 and H3K27me3 in LNCaP human prostate cancer cell lines as well as in three metastatic prostate cancer (two to the liver and one to the lung) tissues from independent patients.As described in Supplementary Discussion and Supplementary Fig. S1, we observed strong overlap between replicate H3K27me3 ChIP-on-chip experiments as well as between SUZ12 and H3K27me3 ChIP-on-chip in both in vitro cell line model and in vivo human tumors.In addition, metastatic prostate cancer tissues from different patients and metastatic sites share a common set of H3K27me3-marked genes.
H3K27me3-marked genes link metastatic prostate cancer to stem cells.To provide functional relevance for the cancer H3K27me3-occupied gene sets, we compared them to molecular correlates in the Oncomine Molecular Concepts Map (MCM; ref. 9), a resource containing f15,000 molecular concepts or biologically related gene sets, for enrichment by disproportionate overlap using Fisher's exact test.MCM analysis of 1,165 H3K27me3-occupied genes with >5-fold enrichment in metastatic prostate cancer tissue (to the liver) revealed an intriguing enrichment network (Fig. 2A and Supplementary Table S1).The most enriched gene expression concepts (P < 1.0 Â 10 À7 ) are ''genes down-regulated in prostate, breast, and lung cancers.''This observation is consistent with the expected repression of PcG target genes, as the transcriptional repressor EZH2 is up-regulated in cancer.In addition, a significant (P < 1.1 Â 10 À5 ) portion of our gene set is located at chromosomes 1q, 17q, 19q, and 20q, all of which have previously been associated with prostate cancer (11).
Interestingly, the most enriched literature concepts (P = 1.2 Â 10 À100 ) are ''H3K27me3-, SUZ12-, or EED-occupied in embryonic stem cells or embryonic fibroblasts, '' revealing a novel link of Polycomb cancer targets to those in stem cells.Importantly, the enrichment of our concept among the embryonic stem concepts is comparable to that between embryonic stem concepts.For example, the ''H3K27me3-occupied in embryonic stem cell'' gene set is enriched by our gene set with OR of 5.69 and P = 1.2 Â 10 À100 and by the ''H3K27me3-occupied in embryonic fibroblasts'' concept with comparable OR of 10.6 and P = 1.1 Â 10 À100 .In addition, the most enriched Gene Ontology concepts (P < 1.6 Â 10 À7 ) include developmental regulators, homeobox proteins, and transcription factors, being consistent with previous reports of PcG target genes in embryonic stem cells (3).As Polycomb-mediated repression is known to control stem cell pluripotency and differentiation (3), we hypothesized that it may be critical for cancer progression.Notably, MCM analysis showed significant links (P < 1.0 Â 10 À7 ) to gene sets down-regulated in recurrent prostate and breast cancers.
We thus sought to develop a Polycomb repression signature in tumor.We selected a common set of 336 H3K27me3-occupied genes from the two metastatic prostate tumors to the livers.Gene Set Enrichment Analysis of these genes in a microarray profiling data set of five benign, six clinically localized, and five metastatic prostate cancers (12) indicated a significant enrichment with down-regulated expression in metastasis (P = 0.01; false discovery rate, 0.009).A set of 87 PcG-occupied genes with the strongest repression during metastasis (P < 0.05 by Cyber-T statistic) was selected and defined as the Polycomb repression signature in metastatic prostate cancer (Fig. 2B and Supplementary Table S2).MCM analysis of these 87 genes preserved the significant links identified above (Supplementary Fig. S2).
The Polycomb repression signature predicts survival of cancer patients.To evaluate the predictive value of the Polycomb repression signature, we examined an independent prostate cancer data set (13).Primary prostate cancer (n = 61) samples were classified into two prognostic groups based on the expression patterns of signature genes.Kaplan-Meier analysis revealed that the resulted two clusters differed significantly in clinical outcome [P = 0.03; hazard ratio, 2.6; 95% confidence interval (95% CI), 1.04-6.54;Fig. 2C].We thus defined the cluster with favorable outcome as the ''low-risk'' group and the other as the ''high-risk'' group.To validate our signature, we predicted samples in an independent prostate cancer data set ( 14) by k-nearest neighbor classification (k = 5) to be in either high-risk or low-risk group.Interestingly, the resulted two groups showed significant difference in clinical outcome (P = 0.0008; hazard ratio, 3.09; 95% CI, 1.54-6.18;Fig. 2C).
Because MCM analysis also linked PcG-occupied genes to breast cancer survival (Fig. 2A; P = 9.7 Â 10 À8 ), we evaluated the prognostic value of the Polycomb repression signature in breast cancer.An approach analogous to above prostate cancer outcome analysis was taken to cluster samples of the Wang et al. estrogen receptor-positive breast cancer training data set (15) and to predict samples of the Pawitan et al. and the Miller et al. validation data sets (16,17).Importantly, the low-risk and high-risk groups from both  the training and the validation data sets showed significant difference in patient relapse (Fig. 2D).
Because molecular classifiers composed of a small number of genes are especially useful in clinical practice, we thus attempted to refine our Polycomb repression signature.We adopted a strategy of cross-validation with stepwise decrement on the number of genes used for classification, and identified a 14-gene signature (Supplementary Table S3) that minimized the cross-validation errors in the Wang et al. breast cancer data set (15).Importantly, as described in the Supplementary Discussion and Supplementary Fig. S3, this 14-gene signature is able to predict patient survival with high significance in six breast and two prostate cancer data sets, and with marginal significance in several glioma and lung adenocarcinoma data sets.A comparison to previously reported molecular classifiers found that our signature overall outperformed the others (Supplementary Discussion and Supplementary Table S4).
Interestingly, multivariate Cox proportional hazards regression analysis of our signature in the independent van de Vijver et al. breast cancer data set (18) revealed significant association with both relapse-free (P = 0.007; hazard ratio, 1.93; 95% CI, 1.20-3.11)and overall survival (P = 0.002; hazard ratio, 3.15; 95% CI, 1.54-6.43).This is independent from established clinical and pathologic variables, such as tumor grade and node status, as well as of greater significance (Table 1).Therefore, our signature provides additional prognostic information beyond standard clinical and pathologic variables.
Polycomb repression signature genes are epigenetically repressed in aggressive tumors.We next sought to confirm, at individual gene level, the epigenetic repression of our signatures genes in aggressive tumors.Genome-wide location analysis of H3K27me3 showed high enrichment ratios for the promoters of a randomly selected 6 genes (Fig. 3A and Supplementary Fig. S4).By ChIP-PCR, we confirmed that these gene promoters contain the H3K27me3 mark in three metastatic prostate cancer tissues.Interestingly, no apparent enrichment of H3K27me3 was observed in localized prostate cancer for five of the six genes, whereas a positive control gene, KCNA1 (1), was enriched in all cancer samples tested (Fig. 3B).In the PC3 prostate cancer cell line, we confirmed that EZH2, SUZ12, and H3K27me3 co-occupy the promoters of all six genes.Importantly, quantitative reverse transcription-PCR (RT-PCR) analysis of three benign, five localized, and seven metastatic prostate cancer tissues showed marked repression of these genes in metastatic samples (Fig. 3C).For example, WNT2, CXCL12, and KRT17 are >100-fold down-regulated.

Discussion
Prostate cancer is, in general, a slowly progressing cancer that varies greatly in clinical outcome, depending on the aggressiveness of an individual tumor.Currently, the most important clinical prognostic indicators of disease outcome are pretherapy prostatespecific antigen, and Gleason score.Nevertheless, many patients incur disease recurrence (5), and thus additional prognostic biomarkers are needed to provide better risk assessment and therapy selection.PRC2 complex proteins are histone methyltransferases that are frequently up-regulated in aggressive tumors with their downstream targets and underlying mechanisms widely unknown.Although genome-wide location analyses of PRC2 have been done in cell lines, a similar study has not been carried out directly in human tissues.Several lines of evidence support the success of our ChIP-on-chip analysis of PRC2 in prostate cancer.Similar to previous reports (19), we observed a highly significant overlap between RPC2-and H3K27me3-occupied genes.A large number of these genes are down-regulated in cancer and associated with poor patient survival, being consistent with EZH2 up-regulation in aggressive cancer.In addition, we have confirmed a randomly selected subset for their epigenetic silencing in cancer.Cancer PRC2-occupied genes largely overlap with stem cell PRC2 targets and include previously identified functional categories such as transcriptional factors, developmental regulators, and genes involved in receptor activities (3,19), suggesting a Polycombmediated transcriptional fingerprint in cancer.
The overrepresentation of cancer PRC2-occupied genes in stem cell concepts may have important implications in the recent model of cancer stem cells.In embryonic stem cells, PRC2-mediated epigenetic silencing maintains the pluripotent stem cell identity (3,4).An exciting theme is emerging that PcG target genes are predisposed to DNA hypermethylation in cancer (8) and these epigenetic changes may convey heritable gene expression patterns critical for neoplastic initiation and cancer progression (20).Our results, in addition, show a Polycombmediated epigenetic program in metastatic cancer cells that is associated with cancer outcome.Because PcG proteins are often expressed at a very low level in differentiated cell types (3), it is unlikely that the stem cell-like chromatin structure preexists in normal adult tissues.Aggressive tumor cells may have acquired this signature during cancer progression, either through dedifferentiation of mature cells or by mutation to adult stem cells (Supplementary Fig. S5).
Our Polycomb repression signature predicts clinical outcome of multiple solid tumors and is different from currently used tumor biomarkers.Unlike prostate-specific antigen for prostate cancer, which is the product of differentiated tumor cells, Polycomb repression signature genes are likely products of EZH2-expressing cancer stem cells or their immediate descendents.These genes encode biomarkers that may capture early abnormality in stem cellinitiated diseases and lead to cancer detection at earlier stages of carcinogenesis.Due to its low gene number and strong association with cancer outcome in multiple patient cohorts, our 14-gene signature is vastly feasible to develop into prognostic assays for clinical usage.In addition, the signature genes may be important tumor suppressor genes and may facilitate the understanding of PcG-mediated tumorigenesis.Taken together, PcG proteins play a unifying role in regulating stem cell, metastasis, and cancer survival by epigenetic silencing of key target genes.

Figure 2 .
Figure2.The Polycomb repression signature predicts clinical outcome of prostate and breast cancer patients.A, network view of the molecular concept analysis of our ''H3K27me3-occupied genes in metastatic prostate cancer'' concept (orange node with black ring ).Each node represents a molecular concept or set of biologically related genes.The node size is proportional to the number of genes in each concept.Each edge represents a statistically significant enrichment (P < 1 Â 10 À5 ; Supplementary TableS1).Enrichments with ''stem cell concepts'' are indicated by orange edges, enrichments with ''cancer survival'' concepts by red edges, and enrichments for genes down-regulated in metastatic prostate cancer concepts by blue edges.B, heatmap of 87 Polycomb repression signature genes that are occupied by H3K27me3 and repressed in metastatic (MET ) relative to clinically localized (PCA ) prostate cancer and benign prostate tissue.Rows represent genes and columns represent samples.C, two main clusters derived from the Yu et al. prostate cancer training data set(13), and the predicted high-risk and low-risk groups for the Glinsky et al. validation data set(14) differed on patients' 10-year metastasis-free survival.Hierarchical clustering of the training data set and k -nearest neighbor classifier prediction of the validation data set were evaluated in the space of 87 Polycomb repression signature genes, thus defining a low-risk group (green line ) and a high-risk group (red line ).The survival difference was assessed by Kaplan-Meier analysis and the P value was reported by log-rank test.Dashed line, median survival time of applicable groups.For each data set the number of primary tumors is indicated.D, two clusters derived from Wang et al. estrogen receptor-positive breast cancer training data set(15) and the predicted high-risk and low-risk groups for the validation Pawitan et al.(16) andMiller et al. (17)  breast cancer data sets have significantly different patient outcomes.Green lines, low-risk groups; red lines, high-risk groups.

Figure 3 .
Figure 3. Polycomb repression signature genes are occupied and silenced by PRC2 in metastatic, but not benign or localized, prostate cancer.A, the H3K27me3 ChIP signals from ChIP-on-chip analysis of metastatic prostate cancer tissues.The plots show unprocessed enrichment ratios (ChIP-enriched versus input DNA) for all probes within a genomic region.Chromosome positions are from National Center for Biotechnology Information build 35 of the human genome.Genes are shown below plots with exons represented by black bars.The start and direction of transcription are noted by arrows.B, analyses of gene promoters for PRC2 and H3K27me3 occupancy in one localized (PCA ) and three metastatic (MET) prostate cancer tissues and the PC3 prostate cancer cell line.ChIP-PCR was done in triplicates using amplified ChIP products for enrichment of ChIP over whole-cell extract chromatin.C, quantitative RT-PCR analysis of gene expression in three benign, five prostate cancer, and seven metastatic tissues.The expression was normalized to the levels of glyceraldehyde-3-phosphate dehydrogenase (GAPDH ) and hydroxymethylbilane synthase (HMBS ).

Table 1 .
Multivariate Cox proportional hazards analysis of the Polycomb repression signature for estrogen receptor-positive tumors (n = 226) in the van de Vijver et al. breast cancer data set Cancer Res 2007; 67:(22).November 15, 2007