Identification of an Integrated Sv40 T/t-antigen Cancer Signature in Aggressive Human Breast, Prostate, and Lung Carcinomas with Poor Prognosis

Understanding the genetic architecture of cancer pathways that distinguishes subsets of human cancer is critical to developing new therapies that better target tumors based on their molecular expression profiles. In this study, we identify an integrated gene signature from multiple transgenic models of epithelial cancers intrinsic to the functions of the Simian virus 40 T/t-antigens that is associated with the biological behavior and prognosis for several human epithelial tumors. This genetic signature, composed primarily of genes regulating cell replication, proliferation, DNA repair, and apoptosis, is not a general cancer signature. Rather, it is uniquely activated primarily in tumors with aberrant p53, Rb, or BRCA1 expression but not in tumors initiated through the overexpression of myc, ras, her2/neu, or polyoma middle T oncogenes. Importantly, human breast, lung, and prostate tumors expressing this set of genes represent subsets of tumors with the most aggressive phenotype and with poor prognosis. The T/t-antigen signature is highly predictive of human breast cancer prognosis. Because this class of epithelial tumors is generally intractable to currently existing standard therapies, this genetic signature identifies potential targets for novel therapies directed against these lethal forms of cancer. Because these genetic targets have been discovered using mammary, prostate, and lung T/t-antigen mouse cancer models, these models are rationale candidates for use in preclinical testing of therapies focused on these biologically important targets.


Introduction
The misexpression or activation of oncogenes, and loss of tumor suppressor gene function are major determinants of the phenotypic behavior of tumors that significantly influence patient morbidity and mortality.Understanding the genetic networks through which these genes operate is critical to comprehensively defining mechanisms of tumorigenesis and identifying critical genetic nodes that could be important new targets for therapies.
The directed expression of SV40 T/t-antigens has led to the development of several important transgenic models with spontaneous epithelial tumor formation, including carcinomas of pancreatic h-islet cells (1) and acinar cells, ovary (2), choroid plexus, lung (3), mammary (4), and prostate glands (4,5).The oncogenic relevance of SV40 T/t-antigen to human cancer is that the T-antigen oncoprotein binds to and functionally inactivates two major tumor suppressor genes, Rb and p53 (6, 7), which are often involved in many human tumors, whereas t-antigen dysregulates the protein phosphatase 2A family of serine/threonine phosphatases (8).Unlike many transgenic cancer models induced by the overexpression of oncogenes, such as ras and myc, T/t-antigen induces significant genomic instability leading to aneuploidy and chromosomal gains and losses that are often seen in human solid tumors (9).Additionally, T/t-antigen can function as one of three minimal cooperating elements along with hTERT and oncogenic H-ras to fully transform epithelial cells (10).
Unlike cancer arising in the human population, tumors in GEM models arise in mice with well-defined genetic backgrounds, where genetic variability can be minimized.This offers significant advantages for studying tumor pathogenesis and molecular mechanisms of oncogenesis caused by a single initiating oncogenic event introduced through the mouse germline (11).
To comprehensively define an intrinsic SV40 T/t-antigen signature embedded in multiple epithelial tumors in vivo and determine its relevance to human cancers, we did an extensive analysis of tumor gene expression profiles from mouse models of the three most prevalent human epithelial cancers, the breast [C3(1)/Tag transgenic mice; ref . 4], lung (Clara cell secretory protein CC10-Tag mice; ref. 3), and prostate [probasin/Tag [transgenic adenocarcinoma of mouse prostate (TRAMP)] mice; ref . 5].Although the expression of thousands of genes are altered in these tumors compared with their normal tissue counterparts, we have determined that the three mouse models share an ''intrinsic'' SV40 T/t-antigen gene signature composed of approximately 150 genes many connected to functional nodes related to p53, pRb, E2F, myc DNA damage/repair, replication, and apoptosis pathways.
The intrinsic SV40 T/t-antigen gene signature defines a specific genetic signature that is quite distinctive compared with the expression patterns of tumors arising through the overexpression of other oncogenes, including Ras, Her2/neu, Myc, and PyMT.Approximately half of the named genes in the T/t-antigen signature are involved in cell cycle, DNA repair/binding/metabolism, chromosome assembly, and cytoskeletal/microtubule assembly.Although many of the genes identified in the T/t-antigen signature have also been found in the wound-related (12) and proliferation gene (13) signatures that predict tumor outcome, a substantial number of additional genes related to replicative function, metabolism, and other processes are contained in the T/t-antigen signature.Importantly, when used to cluster human breast, prostate, or lung tumors based on their expression profiles, the intrinsic T/t-antigen gene signature identifies the most aggressive forms of human breast, prostate, and lung cancers with poor prognosis.This gene set should provide further biological insights into human tumors with an aggressive phenotype and poor prognosis.
We propose that the T/t-antigen signature may be informative in classifying patients with poor prognosis, provides additional potential targets for anticancer therapies for epithelial tumors with poor prognosis, and credentials particular SV40 Tag transgenic cancer models for preclinical testing of such targeted therapies.

Animals
C3(1)/Tag transgenic mice (4) and Clara cell secretory protein CC10-Tag mice (3) were carried in the FVB background strain.Probasin/Tag (TRAMP) mice (5), obtained from The Jackson Laboratory, were in the C57BL/6 background.Animals were housed and cared for in accordance with NIH guidelines and palpitated for tumors twice weekly.Mice were euthanized by CO 2 narcosis.Normal tissue and tumors from at least four to five mice were analyzed for each mouse tumor model.Tumors (0.6-0.8 cm) were removed, with portions fixed in 4% (w/v) paraformaldehyde for histologic analyses, and the remainder was snapped frozen in liquid nitrogen.Normal dorsal, lateral, and ventral lobes of the prostate gland were microdissected and pooled from the same C57BL/6 mouse.Twenty mammary glands at various stages of estrous cycle and lung tissues were collected from randomly selected wild-type FVB female mice and used for the normal tissue reference sample.During the course of this study, TRAMP mice in the C57BL/6 strain were found to develop seminal vesicle phyllodes-like tumors, which were also collected for microarray analysis.Universal Reference RNA for arrays was purchased from Stratagene.

RNA Extraction for Microarray Data Analysis
Total RNA from normal prostate and mammary glands, lung tissues, and tumor samples was obtained by a two-step extraction method using Trizol reagent (Invitrogen) and RNeasy Mini protocol for RNA cleanup by Qiagen.Briefly, 30 mg of tissues were homogenized in 1 mL Trizol reagent using a polytron power homogenizer.After 5-min dissociation of homogenate at room temperature, phase separation of RNA was done by the addition of 200 AL chloroform, vigorous mixing of the sample, and incubation for 10 min before centrifugation at 14,000 Â g for 15 min.The aqueous phase containing the RNA was precipitated with 500 AL isopropanol, mixed, and incubated for 5 min in room temperature before centrifugation at 14,000 Â g for 10 min.The glossy white RNA pellet was washed with 1 mL of 70% ethanol and spun for an additional 5 min at 14,000 Â g.After the air-dried pellet was hydrated with 100 AL diethylpyrocarbonate-water, the RNeasy Mini kit (Qiagen) was used to clean up RNA according to the manufacturer's protocol.High-quality total RNA was obtained by elution with 30 AL diethylpyrocarbonate-water.Twenty micrograms of total RNA from each sample were labeled and hybridized as described (11,14) using the Incyte GEMII 10K cDNA array spotted on poly-L-lysine-coated glass slides provided by the National Cancer Institute (NCI) Advanced Technology Center.

Statistical Analysis Microarray data preprocessing and normalization
SV40 T/t-antigen mouse models.cDNA microarray data from four or five independent tumor samples and their respective normal tissue samples were collected using the Incyte GEMII 10K cDNA array (9,984 features).Microarray image analysis and calculation of the average foreground signal adjusted for the local median background was done using GenePix software.Low-quality spots flagged by GenePix were excluded but low-intensity signals were processed as follows.If the signal was less than 100 in both channels, the spots were flagged as not reliable.If the signal was less than 100 in one channel but greater than 500 in the other channel, the signal value in that channel was set to 100.This quality filtering excluded only a small portion of all spots on an array (on average, 0.5%).Feature data were used if present in at least two specimens in each tumor or normal group.Features (9,574) were available for analysis.For each spot, log base 2 ratio of the green (target RNA) and red (Stratagene Universal RNA) signal was calculated.Within each array, the log ratios were normalized with intensitydependent procedure (Lowess smoothing).
Other transgenic mouse models.Labeled probe from four to five tumor specimens from each transgenic model of breast cancer were cohybridized with labeled probe generated from normal mammary tissue from the same background strain, using the Incyte 10k GEMII chips and GenePix software to extract the raw channel intensities.The preprocessing and normalization steps were done as described above.For each array, feature log base 2 ratios of the intensity from the tumor (red label) and normal (green label) tissue were calculated.An array feature was excluded if there were fewer than three samples with nonmissing log ratios for that array feature in any tumor model, leaving 7,948 array features for the analysis.
All mouse array data are available at the National Center for Biotechnology Information array database. 9uman cancer microarray data sets.Human data sets were selected based on their having a high representation of the T/t-antigen gene set, relatively large sample size, and information related to metastases or clinical outcome.The publicly available data sets of breast cancer patients using Agilent oligonucleotide microarrays (NKI/Rosetta Inpharmatics-Merc) were obtained from Chang et al. (15).The downloaded raw ratios (individual tumor tissue cRNA versus all patient pool cRNA) were globally normalized shifting on an array median of log base 2 ratios to 0. The array slide includes over 24,000 features.Three published Affymetrix probe level data, which profile the prostate (16), lung (17), and mammary (18) cancer patients, were processed with the Robust Multiarray Average algorithm and quantile normalization to obtain log base 2 gene summary measures (19).Both the prostate and lung specimens were hybridized to HG-U95 GeneChips, providing over 12,000 oligonucleotide probe sets for the analysis.The mammary samples were assessed on HG-U133A GeneChip with over 22,000 probe sets (16,17,19,20).
Mouse and human gene mapping.Human orthologues of the mouse SV40 T/t-antigen-specific gene signature were identified using Entrez Gene ID and JAX homology as provided in the NCI/Center for Cancer Research (CCR) mAdb database. 10Entrez IDs with multiple array probes were represented by average expression from the two most correlated probes if the pairwise Pearson correlation coefficient >0.5.

Analysis of Variance
Gene expression profiles from the SV40 T/t-antigen mouse models were compared with respect to specimen type (normal versus tumor tissue), location of tumor (mammary, lung, prostate, and seminal vesicle), and background strain of mice (FVB versus C57BL/6).A three-way ANOVA model with one interaction effect (type X location) was fitted.Cancer genes that differ among all four tumor locations were identified, as those that had a significant interaction effect at the 0.001 level and showed at least a 2-fold change between the maximal and minimal mean tumor/normal ratio over the different locations (2,638 cDNA probes).Based on the ANOVA model, differentially expressed genes between normal and tumor specimens within each tumor location were also identified, as those genes whose expression were significant at the 0.001 level and were at least 2-fold different compared with the mean expression ratio.To limit the number of falsepositive findings, a stringent criterion of 0.001 was used as a cutoff point for the unadjusted P values.We expect that the average number of falsepositive findings will be 10 or less.In our analysis, the number of genes reaching this significance level far exceeds the expected false positives.Using the procedure of Benjamin and Hochberg (21), the false discovery rate (FDR) was calculated for each ANOVA testing.The FDRs were all <0.5% (see also Supplementary Table S1).Among the genes that were significant at the 0.001 level, we only reported genes that showed at least a 2-fold change for each comparison considered.Under the conservative assumption that the chance of a false-positive finding is the same for a gene with at least 2-fold change and a gene with less than 2-fold change, the FDRs would be no greater than those reported (see Supplementary Table S1).
Further selection was applied based on identification of differentially expressed genes between normal and tumor tissue for the three epithelial tumors (mammary, lung, and prostate).The SV40 T/t-antigen oncogenespecific signature included genes similarly differentially expressed in each epithelial tumor (153 cDNA clones).In contrast, genes were included in a tissue-specific SV40 T/t-antigen tumor signature if they were found to be differentially expressed between the tumor and normal samples exclusively for one location.Two hundred and eighty-three, 220, and 999 cDNA clones were identified as specifically dysregulated in mammary, lung, and prostate tumors, respectively.Overall, 3,004 unique array features were selected using ANOVA.

Unsupervised Learning
We used agglomerative hierarchical clustering and multidimensional scaling to inspect the global grouping of expression profiles from tumors and normal tissue for all four SV40 T/t-antigen mouse models.Both of these analyses used median-centered gene expression and the distance metric of one-minus the Pearson correlation coefficient.We applied two-way agglomerative hierarchical clustering with image display to visualize expression patterns of SV40 T/t-antigen-specific and tissue-specific tumor gene signatures in three SV40 mouse models (mammary, lung, and prostate) and other independent data sets of mouse and human cancers.The oneminus Pearson correlation coefficient distance metric and average linkage algorithm were used for each clustering analysis.Before clustering for each gene and tumor sample in the SV40 T/t-antigen mouse models, the mean of the respective normal samples was subtracted.For the other mouse models, the normalized log 2 ratio of the tumor/normal gene expression is displayed.For the clustering of human data, expression of each gene was median centered across all specimens.

Survival Analysis
The end point for the study was overall survival defined as the time until death from any cause.Patients alive at the date of last follow-up were censored at that date.To examine the prognostic value of the SV40 T/ t-antigen-specific signature, we conducted both unsupervised and supervised methods for predicting survival.Unsupervised methods involved classifying the breast (15) and lung (17) cancer data set samples by hierarchical clustering analysis followed by relating the cluster identity to survival.The survival difference between the two clusters of samples was assessed by the log-rank test.
Two complementary supervised methods were used to establish whether the SV40 T/t-antigen-specific signature predicted survival.First, a leave-one-out cross-validation procedure was used to assess the ability of gene expression to predict survival using the data of 295 breast cancer patients from Chang et al. (15).Each training set was obtained by omitting one patient who was treated as the test patient.For each training set, genes in the SV40 T/t-antigen-specific signature, which were associated with survival, were identified by the Cox proportional hazards model.Using these genes, a classifier was constructed with the linear compound covariate algorithm (22), where the risk score was defined as a linear combination of gene expression values weighted by their estimated Cox model regression coefficients.If the risk score of the test patient was higher than the median of risk scores obtained in a training set, the test patient was classified to the high-risk group, otherwise to the low-risk group.The procedure was repeated for every patient.A permutation logrank test was used to assess the significance of prediction.
To further show the predictive ability of the SV40 T/t-antigen-specific signature, a predictor was built by the entire data of 295 breast cancer patients from Chang et al. (15) and validated by the gene expression profiles of 159 breast cancer patients reported by Pawitan et al. (18).Genes in the SV40 T/t-antigen-specific signature, which were associated with the survival in the training set, were used to build the classifier, and each of the 159 test patients was classified by this classifier into the high-or lowrisk group.The Supplementary Materials and Methods provides variables needed to predict the prognosis for a new patient.The survival difference between the two risk groups in the test set was assessed by the regular logrank test.Because the training and test sets were generated with different platforms under different designs (single versus dual channel), to make the data comparable with each other, the expression measurements were standardized to z score (i.e., each gene expression was converted to zero mean and unit variance).

Computational Software
Preprocessing of cDNA microarray data and ANOVA were done using Insightful S-Plus 6.0 (Insightful Corp.).The Affymetrix data were preprocessed with R 2.3.1 11 and Bioconductor Affy project. 12Hierarchical clustering and image plots were done using R 2.3.1 and custom Microarr package.Multidimensional scaling was carried out in BRB ArrayTools, 13 developed by Dr. Richard Simon and Amy Peng Lam.Survival analysis was done using R 2.3.1 and Survival package.

Pathway Analysis
Ingenuity Pathway Analysis (IPA) version 2.0 was used to identify patterns of biological interactions and relationships between the genes of interest identified by the microarray analyses.This algorithm identifies known gene, protein, and regulatory interactions based on existing reports in the literature and schematically depicts the interactions with relative expression levels collected from the microarray analyses.A detailed description of IPA can be found online. 14Data sets containing Gene ID identifiers and their corresponding expression log 2 fold changes (at least 2-fold mean difference of tumor/normal ratio) from three-way ANOVA analysis were uploaded into IPA.A total of 3,004 genes that showed at least a 2-fold mean difference in the tumor/normal expression ratio (P < 0.001) were used for the IPA analysis.

Validation of Observed Gene Expression Profiles by Semiquantitive Reverse Transcription-PCR
To confirm the microarray results, semiquantitative reverse transcription-PCR (RT-PCR) was done on a subset of genes to verify gene expression changes (selected from Table 1).Total RNA (1 Ag) was reverse transcribed according to the manufacturer's instructions with oligo(dT) using Super-Script RT III (Invitrogen Life Technologies).One microliter of the cDNA was used for each PCR.The PCR primer information and thermocycling conditions for each gene are summarized in Supplementary Table S2.The PCRs were done using Platinum Taq polymerase according to the manufacturer's protocol (Invitrogen Life Technologies) with varying concentrations of MgCl 2 for specific primer sets (specified in Supplementary Table S2).The PCR thermocycling variables consisted of denaturation at 94jC for 30 s, followed by annealing at either 55jC or 58jC for 1.5 min for 24 to 35 cycles depending on the gene (indicated in Supplementary Table S2), and extension at 72jC for 1 min.PCR products were resolved and visualized on a 2.0% agarose gel containing 0.1 Ag/mL ethidium bromide.

Immunohistochemical Staining for Ki67
Ki67 was identified using a rabbit polyclonal anti-Ki67 antibody (Novocastra) and the avidin-biotin-peroxidase (avidin-biotin complex) technique using the Vectastain ABC Rabbit Elite kit (Vector Laboratories, Inc.).Unrelated mouse immunoglobulin was used as the negative control at the same concentration as the specific antibody.

Results
Initial analyses of microarray data from tumor samples from the TRAMP model clustered the tumors into two very distinct groups.Histologic examination revealed that in the C57BL/6 background, phyllodes (epithelial-stromal) tumors originating in the seminal vesicles frequently developed in the TRAMP mice (data not shown) in addition to adenocarcinomas of the prostate glands.Because the seminal vesicle phyllodes tumors are composed primarily of Tag-positive mesenchymal cells with T-antigen-negative epithelial cells, cluster separately from the epithelial tumors as visualized by hierarchical clustering and multidimensional scaling (Fig. 1), and are a rare human tumor type, they were excluded to identify a T/tantigen signature of epithelial tumors.
To further identify relations between the four SV40 T/t-antigen tumor types, two exploratory analyses (unsupervised hierarchical clustering and multidimensional scaling) were done using microarray gene expression from 2,638 cDNA clones, for which a statistically significant difference between the tumor versus normal tissue and the tumor location was identified by means of the multifactor ANOVA model (see Materials and Methods).This gene set represents cancer transformation-related genes that distinguish the four tumor types.As shown in Fig. 1, epithelial tumors of the lung, prostate, and mammary glands cluster separately from their respective normal tissues, with the largest separation observed between prostate tumors and normal prostate.Phyllodes seminal vesicle tumors and normal seminal vesicle samples group relatively close to each other but cluster independently from tissues from other locations.Because the cellular composition, histologic features, and gene expression pattern of the seminal vesicle phyllodes tumors are quite distinct from the epithelial tumors, this provided additional justification to focus the remainder of the study on only the epithelial breast, lung, and prostate tumors.Of the 3,004 genes identified by the full ANOVA analysis (see the Materials and Methods and Supplementary Table S3), global functional analysis could be done on 1,601 genes using IPA.The majority of these genes were classified as cancer related in the disease and disorders functional category with the most significant functional classes, including cell cycle, DNA replication, recombination and repair, cellular growth and proliferation, cell death, and cellular assembly and organization (data not shown).
Identification of a SV40 T/t-antigen intrinsic signature.Array data from transgenic mammary, prostate, and lung tumors induced by the same SV40 viral oncoproteins were analyzed to identify genes that were similarly dysregulated in all of the tumor types.These genes represent an intrinsic set of genes whose expression is highly conserved in all three tumor types, as depicted in a representative heat-map in Supplementary Fig. S1.The SV40 T/t-antigen intrinsic gene signature includes a total of 153 genes of which 117 (76%) genes were up-regulated and 34 (23%) genes were down-regulated.Only twenty-nine of these genes were similarly dysregulated in the phyllodes seminal vesicle tumors, further showing that the phyllodes tumors are quite distinct from the epithelial tumors (data not shown).
The intrinsic SV40 T/t-antigen signature is specific.Our previous work showed that the gene expression signature for T/tantigen-induced mammary tumors was quite distinct from gene signatures identified for other transgenic mammary tumors (11).However, to further address whether the newly identified T/tantigen signature is specific for SV40 T/t-antigen-induced epithelial tumors or whether similar changes are observed in tumors initiated by other oncogenic pathways, we analyzed microarray expression data from other mammary tumor models induced through the loss of p53 (Wap-cre;p53ko flox/flox mammary mouse tumor model; ref. 23), loss of BRCA1 in combination with  S2, the SV40 T/t-antigen signature is characteristic of the C3(1)-T/tantigen tumors and is also highly represented in the p53 À/À and BRCA1 À/À ;p53 +/À compound mutant tumors that cluster together.However, the expression signature is not characteristic of tumors induced by myc, ras, her2/neu, or PyMT.
A relatively few number of genes in the SV40 T/t-antigen proliferation signature shared similar expression patterns in all of the tumor models.These included caveolin 1 (Cav1), enhancer of zeste homologue 2 (Ezh2), Cdca3, topoisomerase IIa (Top2a), and a few of the chromosome maintenance factors, suggesting that these genes may be more ubiquitously involved in transformation and not oncogene specific.
The SV40 T/t-antigen intrinsic signature represents an integrated genetic network.The 120 named genes comprising the SV40 T/t-antigen intrinsic gene set is enriched for genes involved in several important cellular pathways that have been implicated in many human cancers (Table 1), including cell cycle, DNA replication/repair/metabolism/maintenance, cytokinesis and cytoskeletal structure, metabolism, transcription, and signal transduction pathways.This set of genes was queried for known functional interactions using IPA (Fig. 2).This algorithm identifies biological relationships between proteins produced by the genes that directly interact with each other at various functional levels, including physical binding and interaction, protein activation, enzyme substrate association, and transcriptional regulation.The interactions are graphically displayed as networks composed of nodes (individual proteins) and edges (biological relationship between nodes).
The overexpression of these genes as well as the proliferation markers proliferating cell nuclear antigen (Pcna) and Ki67 clearly shows that SV40 T/t-antigen expression results in enhanced Figure 2.An intrinsic biological network associated with expression of the SV40 T/t-antigen oncoproteins in the GEM models.Of the 120 known genes in the SV40 T/t-antigen gene cluster, 85 of the genes formed biological networks that could be related to the tumor suppressor genes p53 and pRB nodes.Gene symbol and description, LocusLink ID, and expression values for individual nodes for each tumor are listed in Table 1.Up-regulated (red) and down-regulated (green ) genes.Genes highlighted in blue are proliferation markers, Ki67 and PCNA, and those that are highlighted in purple are potential chemotherapeutic targets.
progression through cell cycle checkpoints leading to rapid cell division in these tumors.Caspase-2 and caspase-3, regulators of apoptosis, were also overexpressed, consistent with the paradoxical increased rate of apoptosis observed in the tumors (29).A few genes involved in cellular metabolism of aldehydes by alcohol dehydrogenase 2 (Aldh2) and glutathione by glutathione Stransferase (GST), A1 (Gstm1) were down-regulated.Genes involved in cell signaling, adhesion, and cytoskeleton-related proteins (Cav1, cxcl12, gelsolin, and villin) were also down-regulated in the tumors, suggesting a loss in specific intracellular signaling pathways and cell-cell adhesion during the tumorigenic process in these models.
Less than half of the genes identified in the T/t-antigen signature are represented in the 70-gene (30), metastasis (31), wound response (12), proliferation (13), and invasive gene (32) signatures that have been reported for classification and prognosis of human tumors (Supplementary Table S1).
Validation of the T/t-antigen expression profiles.Semiquantitative RT-PCRs were done for several genes of interest identified by our microarray studies and discussed above.The majority of the genes that were in the intrinsic SV40 T/t-antigen signature were confirmed to be similarly up-regulated or down-regulated by RT-PCR analysis (Fig. 3A).Ki67 immunohistochemical staining correlated with the gene microarray data across the different GEM mammary tumors (Fig. 3B).Strong positive Ki67 nuclear immunoreactivity is present in mammary tumors from the C3(1)/Tag model (Fig. 3B, b).Weak and multifocal immunoreactivity of Ki67 is found in sections from BRCA1 tumors (Fig. 3B, c) and moderately diffused staining in sections of her2/neu tumors (Fig. 3B, d), further showing that the expression pattern of genes related to cellular proliferation correlates in vivo tumor proliferation that is highest for the SV40 T/t-antigen-induced tumors and not as pronounced in the tumors initiated by other oncogenic pathways (Supplementary Fig. S2).
The intrinsic SV40 T/t-antigen oncogene signature identifies highly aggressive forms of human cancers with poor prognosis.Because several transgenic cancer models induced by T/t-antigen expression share important similarities to subsets of human tumors and are quite aggressive, we examined whether the intrinsic T/t-antigen signature was represented in specific classes of human tumors and whether it correlated with biological behavior and prognosis.Hierarchical clustering using the 120 named genes from the T/t-antigen signature was done using global expression profiles from human breast (15), prostate (16), and lung cancers (17).
One hundred eleven genes (f93% of the intrinsic T/t-antigen gene signature) were available in the data set from Chang et al. (15).Notably, the basal-like and majority of luminal type B subgroups of breast cancers share a remarkably similar expression pattern to the intrinsic T/t-antigen gene profile (Fig. 4A).Additional analyses done with data from Sorlie et al. (20) indicated that the T/t-antigen genes are consistently overexpressed in the basal and luminal type C breast tumors (however, less than half of the T/t-antigen genes were available in this human data set; data not shown).Furthermore, as displayed in Fig. 4A, the majority of breast tumors classified as luminal type-A and normal-like type exhibit lower expression of the T/t-antigenspecific signature.The basal-like subtype is associated with a higher frequency of TP53 mutations and the worst prognosis, whereas luminal types B and C tumors are associated with considerably poorer overall and relapse-free survival than luminal subtype A as reported previously (20).The ERBB2 + tumors do Figure 3. Validation of SV40 T/t-antigen signature genes in mammary, lung, prostate, and seminal vesicle (Sem.Ves.) tissues and tumors.A, semiquantitative RT-PCRs were done to confirm changes in transcript expression for FoxM1, Brca1, Chk1, Ect2, Ezh2, Cav1, and Stmn1 as described in Materials and Methods.The lanes listed on top represent normal tissue (N ) and tumor sample (T ) from various tissue types.Gene names are listed on the left.All genes were elevated in transcript expression for tumor sample when compared with respective normal tissue.Cyclophillin A served as internal control.B, Ki67 immunohistochemical staining of GEM tumors correlates with gene microarray data.a, control tissue (small intestine) showing positive nuclear immunoreactivity to Ki67 in crypt epithelia.Strong positive nuclear immunoreactivity to Ki67 is present in mammary tumors from the C3(1)/Tag model (b), whereas there is weak and multifocal immunoreactivity in sections from BRCA1 tumors (c ) and moderate diffuse staining in sections of her2/neu tumors (d).Anti-Ki67 antibody, Mayer's hematoxylin counterstain.Magnification, Â20.
not seem to segregate into a specific cluster based on the T/tantigen signature.
Eighty-seven gene orthologues (f72% of the 120 known genes from the intrinsic T/t-antigen gene signature) were identified in the gene expression data set representing 23 primary and 9 metastatic human prostate carcinomas from LaTulippe et al. (16).As depicted in Fig. 4B, metastatic prostate carcinomas cluster separately from primary prostate tumors based on the T/t-antigen signature.
Eighty-three orthologues of the T/t-antigen signature were found in the microarray data of the 186 human lung carcinomas reported by Bhattacharjee et al. (17).Hierarchical clustering showed that all of the small cell and squamous cell lung carcinomas and a subset of adenocarcinomas harbored the intrinsic T/t-antigen signature (Fig. 4C).
When the information on patients' survival was available with the data sets, we calculated the Kaplan-Meier survival curves for those classified by hierarchical clustering into two groups whose .Gene expression patterns of the intrinsic SV40 T/t-antigen signature in human cancer patients.Two-way hierarchical clustering and image plots using available human orthologues of the SV40 oncogene-specific signature in the published human cancer microarray data sets as described in Materials and Methods.In addition, image plots of the SV40 T/t-antigen mouse models are displayed with gene ordering based on the hierarchical clustering of the respective human data set.A, 111 orthologues were found for the subclasses of human breast carcinoma reported by Chang et al. (15).

Cancer Research
Cancer Res 2007; 67: (17).September 1, 2007 expression profile does or does not share the T/t-antigen-specific pattern.The comparison of overall survival between the two groups show that breast cancer patients (Fig. 5A) with tumors expressing the T/t-antigen profile have a very significantly worse prognosis (P = 7eÀ10).The comparison of the lung cancer patient survival was available only for the adenocarcinoma subtype.As shown in Fig. 5B, the adenocarcinoma patients expressing the T/t-antigen signature also have a significantly poorer overall survival (P = 0.0479).
These findings strongly suggest that the intrinsic T/t-antigen signature reflects critical molecular derangements that contribute to aggressive clinical behaviors of several types of human epithelial cancers and is associated with poor prognosis.
The SV40 T/t-antigen oncogene signature is highly predictive of survival in breast cancer patients.In the cohort of breast cancer patients analyzed, our clustering analysis indicated a strong association between the T/t-antigen expression profile and overall survival.Therefore, we examined whether the T/t-antigen signature would be useful in predicting the clinical outcome.For this purpose, a supervised analysis was done using two independent populationbased data sets of expression profiles of breast cancer patients [i.e., the group of 295 patients from Chang et al. (15) and another 159 patients from Pawitan et al. (18)].Ninety-eight orthologues (85%) of the T/t-antigen signature were available in both of the microarray data sets.Using a univariate Cox proportional hazards model, we identified a subset of 61 genes significantly associated with overall survival (P < 0.05) in the Chang data set.The leave-one-out crossvalidated Kaplan-Meier survival curves (Supplementary Fig. S3) for the predicted low-and high-risk groups in the Chang data set (15) exhibited a very clear separation of the groups (permutation P < 0.001).The classifier built on the 61 genes from the Chang data set was then applied to the external set of 159 breast cancer patients (28).The expression pattern of the 61 genes in the external test set ordered by increasing risk score is shown in the Fig. 5C.The resulting Kaplan-Meier curves (Fig. 5C) confirm that the T/tantigen-specific expression profile is highly associated with a poor clinical outcome (P = 0.00218).

Discussion
In this study, we have discovered a conserved gene transcriptional signature in multiple epithelial tumors induced by the inactivation of p53 and Rb through the targeted expression of the early region of SV40.Although T/t-antigens are not etiologically involved in human epithelial tumors, the fact that a significant portion of human tumors contain dysfuntional p53 or Rb suggests that T/t-antigen may induce similar oncogenic alterations.The expression pattern identified in this study intrinsic to the expression of the SV40 T/t-antigens in transgenic mouse mammary, prostate, and lung tumors is significant for several reasons: (a) it shows that a core oncogenic mechanism is

Cancer Research
Cancer Res 2007; 67: (17).September 1, 2007 operative in all of these epithelial tumor types involving the dysregulation of genes controlling the cell cycle, cell proliferation, replication, DNA damage responses, and apoptosis; (b) the intrinsic signature is quite specific for SV40 T/t-antigen-induced tumors when compared with tumors arising from other (ras, PyMT, and her2/neu) oncogenic pathways and, therefore, is not simply a general signature for cancer; (c) the SV40 T/t-antigen signature is composed of genes that form a highly integrated genetic network based on known functional and physical interactions; and (d) the T/t-antigen signature is predictive of poor prognosis for human breast, prostate, and lung tumors.Therefore, it is likely that this signature contains genes whose dysregulated function might be altered through therapeutic targeting to improve outcome.
Approximately 3,000 genes were found to be differentially expressed between the tumor types compared with their respective normal tissues.Genes expressed in a tumor type-specific manner were primarily related to cell-cell signaling/interaction, extracellular matrix components, growth factors and receptors, tissue structure, and cellular metabolism, suggesting that each tissue type has different requirements for tumorigenesis, especially related to the stromal environments of the breast, lung, and prostate. 15However, only about 120 named genes were found to be similarly expressed in all three tumor types, suggesting that this core intrinsic T/t-antigen set of genes are likely critical to the transforming function of T/t-antigen.
Importantly, the intrinsic SV40 T/t-antigen gene signature is not a feature of tumors initiated by other oncogenes or inactivation of suppressor genes but is most specific to tumors induced by T/tantigen.By interrogating the gene expression profile of various mouse mammary tumor models carrying the transgenes MMTVmyc, MMTV-ras, MMTV-her2/neu, MMTV-PyMT, and C3(1)-Tag, and knockout of tumor suppressor genes Wap-cre;p53 fp/fp or Brca1 Co/Co , and MMTV-Cre; p53 +/À , we found that these tumors did not fully express the intrinsic SV40 T/t-antigen proliferation signature, although various degrees of alterations in the expression of some of these genes were present in tumors induced by other oncogenic processes.Gene expression of Wap-cre; Brca1 Co/Co tumors showed the greatest similarity to that of the C3(1)-Tag tumors followed by the two targeted p53 knockout models (Wap-cre;p53 fp/fp and MMTV-Cre; p53 +/À ).However, transgenic overexpression of MMTV-myc, MMTV-ras, MMTV-her2/neu, and MMTV-PyMT oncogenes were most dissimilar to the SV40 T/tantigen proliferation cluster.This analysis shows that the SV40 T/t-antigen proliferation signature represents a distinct state of gene expression related to cell cycle and DNA damage response that is not typically observed in tumors that have been transformed by other molecular mechanisms.
When analyzed in the context of gene interactive networks, 85 of the 120 named genes contained within the SV40 T/t-antigen signature are related through several nodes, including Rb and p53, as expected, because it has been well documented that T-antigen binds to and functionally inactivates both of these critical tumor suppressor genes.Based on the disruption of Rb function in the Tag-transformed cells, the identification of an E2F node and genes related to its function is also consistent with the regulation of E2F by the Rb pathway (6).
Interestingly, analysis of the SV40 T/t-antigen signature also revealed that the tumor suppressor Brca1 was overexpressed in conjuction with a network of genes related to Brca1 function in all three epithelial tumor types, not only mammary tumors.Dysregulation of Brca1 pathways by SV40 T/t-antigen has not been reported previously.Brca1 was only overexpressed in the three T/t-antigen transgenic mouse models and not in the other mammary tumor models studied.The BRCA1 tumor suppressor gene has many functional roles, including DNA damage response, DNA repair and recombination, cell cycle checkpoint control, protein ubiquitylation, and chromatin remodeling (33).We found that Brca1-interacting genes, such as Chek1, Nek2, Rbbp7, and Msh6, fall within the Brca1 cluster, suggesting an integrative role for Brca1 in regulating G 2 -M and G 1 -S checkpoints.However, it is not clear whether expression of SV40 T/t-antigen oncoprotein directly affects Brca1 overexpression or whether this is a secondary result related to enhanced cell proliferation resulting in an increase in S and G 2 phase genes that could up-regulate Brca1 expression.
The intrinsic SV40 T/t-antigen signature also includes many other genes associated with DNA replication, damage repair, cytokinesis, and chromosome maintenance.Our analysis revealed the overexpression of a set of genes that included replicationinitiation complex minichromosome maintenance proteins (Mcm's) and inhibitor of preinitiation complex assembly, geminin (Gmnn), suggesting that their dysregulation leads to inappropriate binding of these proteins to chromatin during the cell cycle (particularly at S, G 2 , and early mitosis phases) leading to aberrant DNA replication through illegitimate origin firing.These effects could lead to significant genomic instability, which is characteristic of SV40 T/t-antigen transgenic tumors (9).Metaanalyses have shown a strong association between the overexpression of Mcm's and Gmnn with aggressive epithelial tumor behavior and poor cancer prognosis, further suggesting that these cell cycle biomarkers may be potentially important predictors for routine clinical screening.
Contained within the T/t-antigen signature are many overexpressed genes related to proliferation, such as Ki67 and Pcna.Immunohistochemical staining confirmed that Ki67 expression was highest in the T/t-antigen tumors compared with tumors arising in other tumor models.Additionally, many other genes related to replication and proliferation were identified in the T/tantigen signature, including the cell cycle-regulated genes Cdc28 protein kinase 2, Chek1, Cdc2a, Ccnd2, Ccna2, and Ccnf (genes that encode cyclin D2, cyclin A2, and cyclin F, respectively); chromosomal instability genes, such as polo-like kinase (Plk4), serine/ threonine kinase 12 (Stk12), Bub1b, and Mad2 mitotic arrest deficient-like 1 (Mad2l1); and transcription regulators FoxM1 and Ezh2.Elevated levels of the FoxM1 transcription factor has been shown to accelerate development and progression of prostate carcinomas in TRAMP/Rosa26-FoxM1b double transgenic mice compared with TRAMP mice (34).Furthermore, the T/t-antigen proliferation cluster includes Top2a, dihydrofolate reductase (Dhfr), thymidylate synthase (Tyms), and ribonucleotide reductase M1 polypeptide (Rrm1; Fig. 2, purple), potential targets for drug therapies.
The importance of genes related to cell cycle and proliferation in identifying tumors with poor prognosis has been recognized previously (13,20,35).However, only about one third of the genes contained in the intrinsic SV40 T/t-antigen signature have been identified previously as part of a ''proliferation cluster'' expressed in the basal epithelial-like subgroup of breast carcinomas (13; reviewed in ref. 36).Therefore, the T/t-antigen signature identifies a significantly larger gene set that can be integrated into a highly robust genetic network based on known functions.Examples of genes previously not identified include cyclin regulators cyclin kinase subunit 1 and cdc7/Dbf4, claspin, a regulator of DNA damage repair, and aurora kinase B, essential for chromosome segregation and cytokinesis.
Other gene signatures have been reported that predict disease outcome.Thirty-seven genes identified in the wound response signature are also contained in the T/t-antigen signature, many of which are related to cell cycle and DNA replication.Eighteen genes in the T/t-antigen signature are found in common with the wound, cell cycle, and proliferation signatures, suggesting that these genes may be most often dysregulated in the most aggressive tumors.Interestingly, only three genes (Ect2, Mcm6, and Cenpa) identified in the 70-gene prognostic signature (30), one gene (Ki67) identified in the BMI-1 prognostic signature (37) and one gene (Tubb) found in the invasive gene signature (32), were also contained in the T/t-antigen signature.It is not clear exactly why these differences in gene content between the signatures exist, although each signature was identified based on different biological questions.
This study shows an important value of applying genetically engineered mouse models for the study of oncogenic pathways.Variance in gene expression, which is often a limitation of studies using relatively small cohorts of human samples, can be greatly minimized in studies using animals with similar genetic backgrounds (11). 16This provides a means of enhancing the identification of more robust gene expression signatures that may not be apparent in human data sets due to variations in gene expression resulting from heterogeneous genetic backgrounds.Previous studies have also used mouse models to identify important gene signatures related to particular oncogenic pathways relevant to human cancers, including prostate (37,38), liver (39), lung (40), and breast (41).
We also did a reverse analysis by clustering tumors from the various mouse mammary cancer models using the ''proliferation cluster'' reported by Perou et al. (13) based on 49 available genes.This similarly shows that the expression patterns of the C3(1)-T/tantigen and BRCA1 À/À ;p53 +/À compound tumors were the most similar to the expression pattern characteristic of the basal epithelial-like subgroup of human breast cancers (data not shown).
Interestingly, tumors from the T/t-antigen mouse models expressed lower levels of Gstm1, a member of a family of enzymes that play a major role in cellular detoxification system and likely protect cells against reactive oxygen metabolites or carcinogens.Lack of Gstm1 or GST polymorphisms have been reported to correlate with an increase susceptibility to lung cancer (42), breast cancer (43), and prostate cancer (44).Similarities shared across tumors from the different oncogene models and p53-knockout models included up-regulation of Cdc3a and H2A histone family member z (H2afz) and down-regulation of genes, such as lipoprotein lipase (Lpl), complement component 3 (C3), crystallin l (Crym), and Cav1.
Crym and Cav1 were consistently decreased in all the tumor types as well as in other mouse models of breast cancer.Reduced Crym expression has been associated with in hormone-refractory prostate cancer and thus may play an important role in clinical progression of the disease (45).These results suggest that the T/tantigen models may be useful for deciphering the functional properties of the Crym gene for growth and survival of tumor cells.
Recently, Cav1 has emerged as an important gene in oncogenic transformation and may negatively regulate cell proliferation and metastasis in human cancers and mouse models of cancer (reviewed in ref. 46).Because it has been reported that transcription of Cav1 is regulated by p53 (47), the targeted inactivation of p53 by T-antigen may result in the reduced Cav1 expression in the multiple tumor models.Cav1 expression negatively regulates cell cycle progression by inducing G(0)/G(1) arrest via a p53/p21(WAF1/ Cip1)-dependent mechanism.
Because the T/t-antigen signature contains many genes controlling key cellular processes related to replication, proliferation, cell cycle regulation, DNA damage repair, and apoptosis, we wished to determine the relevance of this signature to human breast, prostate, and lung cancers.When applied to large expression profile data sets of these human tumors, hierarchical clustering using the T/t-antigen signature clearly distinguished aggressive tumors with poor prognosis from less aggressive tumors.Kaplan-Meier analyses of tumors clustered into groups either exhibiting or not exhibiting the T/t-antigen signature showed a highly significant difference in patient survival.This finding indicates that the genes and integrated network discovered in this study are extremely important in understanding molecular features that distinguish tumors with favorable versus unfavorable prognosis.
The T/t-antigen signature was sufficient to categorize human breast tumors into the basal or luminal types B/C categories (20), which generally are unresponsive to hormone and chemotherapies and have poor prognosis.Similarly, the T/t-antigen signature distinguished metastatic from nonmetastatic prostate tumors.Therefore, because the genetic network represented by this signature is operative in tumors often intractable to existing therapies with the worst prognosis, therapies that target key nodes within this network may be critical to improving survival in patients with tumors expressing the T/t-antigen signature.
The study of various SV40 T/t-antigen transgenic mouse models of cancer has allowed us to identify both common and tissuespecific oncogenic responses to these viral oncoproteins with several important similarities to the corresponding human cancers.This is the first study to show that the SV40 T/t-antigen viral oncoproteins can cause an intrinsic gene expression profile that recapitulates the aggressive phenotypes of aggressive human cancers.Whereas some variation in the complement of genes in the proliferation signature are likely to be found in different tumors, increased expression of the core set of proliferation genes in tumors is often associated with aggressive cancers and poor prognosis in patients with breast cancer (13,30,48), lung carcinoma (17,49), and prostate cancer (16,50).Genes from the intrinsic SV40 T/t-antigen signature clustered commonly with the aggressive basal-like tumors of the breast, metastatic prostate carcinomas, and small cell and squamous lung carcinomas, further supporting that intrinsic SV40 T/t-antigen gene cluster is representative of human epithelial tumors, which contributes to aggressive clinical pathologies and poor prognosis; thus, these models may be highly relevant for studying those subtypes of human cancers and developing therapeutic agents that could target this proliferation network.Furthermore, identifying the expression changes during the course of tumor evolution in these models could significantly enhance our knowledge of genetic changes during tumor progression to identify potential biomarkers and therapeutic targets.

Figure 1 .
Figure1.Comparative cDNA microarray analysis of SV40 T/t-antigen mouse models of human cancers carried out for 2,638 array probes selected by the ANOVA interaction as described in Materials and Methods.A, average linkage dendrogram from the hierarchical clustering.B, three-dimensional representation of the gene expression profiles.Note that the microarray data sets from each group clustered closely to each other, indicating that all the replicates showed similar gene expression patterns and that the tumor groups are quite dissimilar from the normal tissues.The phyllodes tumor (orange ) and normal tissue (yellow ) of the seminal vesicle cluster together and are quite dissimilar to the epithelial tumors and normal tissues, respectively.

Figure 4
Figure 4. Gene expression patterns of the intrinsic SV40 T/t-antigen signature in human cancer patients.Two-way hierarchical clustering and image plots using available human orthologues of the SV40 oncogene-specific signature in the published human cancer microarray data sets as described in Materials and Methods.In addition, image plots of the SV40 T/t-antigen mouse models are displayed with gene ordering based on the hierarchical clustering of the respective human data set.A, 111 orthologues were found for the subclasses of human breast carcinoma reported by Chang et al.(15).

Figure 4
Figure 4 Continued.B, 87 orthologues were available for the expression profiles of primary and metastatic prostate carcinomas published by LaTulippe et al. (16).

Figure 4
Figure 4 Continued.C, 83 orthologues of the SV40 T/t-antigen gene signature are shown in the subtypes of human lung carcinomas reported by Bhattacharjee et al. (17).

Figure 5 .
Figure 5. Survival and the T/t-antigen signature in human breast and lung cancer patients.A, Kaplan-Meier overall survival curves were calculated from every patient grouped in the cluster 1 or 2 from the Chang breast cancer data set depicted in Fig. 4A .B, Kaplan-Meier overall survival curves were calculated only for the lung adenocarcinoma patients from the Bhattacharjee data set grouped in cluster 1 or 2 (survival information is not available for the other lung cancer subtypes) as depicted in Fig. 4B.C, prediction of overall survival using the SV40 T/t-antigen signature.The image plot was generated with the external test set of 159 breast cancer patients from Pawitan et al. (18) and 61 survival associated genes selected in the training set as described in Materials and Methods.The samples are ordered by increasing risk scores computed with the compound covariate algorithm.Black bars, patient death.The Kaplan-Meier overall survival curves are shown for the high-and low-risk groups identified with the prognostic classifier.