A color analogy to illustrate the advantages of spectra variables for modeling. a) Individual observations of color. b) Dimension Reduction (additive color theory), all colors can be represented using 3 quantitative RGB variables. c) Standard-use, hierarchical clustering identifies 3 groups. Each sample is categorized into a group (A, B or C). Subsequent modeling uses one categorical variable. d) Multivariable modeling with quantitative spectra variables, multiple spectra integrated directly into a multivariable analysis. Each uncorrelated variable can be assessed separately for its predictive value for an outcome. This implementation retains higher resolution because the variables are quantitative and retain integrity to the initial data. Note, lower-resolution versions of xB and xG can be achieved using hierarchical groups, but the loss of quantification may lose power. xR cannot be captured by any group ordering and associations for this spectrum would be lost using hierarchical groups. Three dimensions shown for visualization purposes; transcriptome data is likely to harbor a higher number of dimensions.
ARTICLE ABSTRACTTranscriptome studies are gaining momentum in genomic epidemiology, and the need to incorporate these data in multivariable models alongside other risk factors brings demands for new approaches.
Here we describe SPECTRA, an approach to derive quantitative variables that capture the intrinsic variation in gene expression of a tissue type. We applied the SPECTRA approach to bulk RNA sequencing from malignant cells (CD138+) in patients from the Multiple Myeloma Research Foundation CoMMpass study.
A set of 39 spectra variables were derived to represent multiple myeloma cells. We used these variables in predictive modeling to determine spectra-based risk scores for overall survival, progression-free survival, and time to treatment failure. Risk scores added predictive value beyond known clinical and expression risk factors and replicated in an external dataset. Spectrum variable S5, a significant predictor for all three outcomes, showed pre-ranked gene set enrichment for the unfolded protein response, a mechanism targeted by proteasome inhibitors which are a common first line agent in multiple myeloma treatment. We further used the 39 spectra variables in descriptive modeling, with significant associations found with tumor cytogenetics, race, gender, and age at diagnosis; factors known to influence multiple myeloma incidence or progression.
Quantitative variables from the SPECTRA approach can predict clinical outcomes in multiple myeloma and provide a new avenue for insight into tumor differences by demographic groups.
The SPECTRA approach provides a set of quantitative phenotypes that deeply profile a tissue and allows for more comprehensive modeling of gene expression with other risk factors.