American Association for Cancer Research
00085472can190573-sup-217376_2_supp_5519840_prh1mf.pdf (41.5 kB)

Supplementary Data from Gaussian Mixture Models for Probabilistic Classification of Breast Cancer

Download (41.5 kB)
journal contribution
posted on 2023-03-31, 02:25 authored by Indira Prabakaran, Zhengdong Wu, Changgun Lee, Brian Tong, Samantha Steeman, Gabriel Koo, Paul J. Zhang, Marina A. Guvakova

Figure S1. Bland-Altman plot. It shows the acceptable average of the differences (bias = -0.049; SD = 0.1038) between two image-based methods of measuring the IGF-IR expression levels in IHC stained tissue: ImageJ and COMPAi.



In the era of omics-driven research, it remains a common dilemma to stratify individual patients based on the molecular characteristics of their tumors. To improve molecular stratification of patients with breast cancer, we developed the Gaussian mixture model (GMM)–based classifier. This probabilistic classifier was built on mRNA expression data from more than 300 clinical samples of breast cancer and healthy tissue and was validated on datasets of ESR1, PGR, and ERBB2, which encode standard clinical markers and therapeutic targets. To demonstrate how a GMM approach could be exploited for multiclass classification using data from a candidate marker, we analyzed the insulin-like growth factor I receptor (IGF1R), a promising target, but a marker of uncertain importance in breast cancer. The GMM defined subclasses with downregulated (40%), unchanged (39%), upregulated (19%), and overexpressed (2%) IGF1R levels; inter- and intrapatient analyses of IGF1R transcript and protein levels supported these predictions. Overexpressed IGF1R was observed in a small percentage of tumors. Samples with unchanged and upregulated IGF1R were differentiated tumors, and downregulation of IGF1R correlated with poorly differentiated, high-risk hormone receptor–negative and HER2-positive tumors. A similar correlation was found in the independent cohort of carcinoma in situ, suggesting that loss or low expression of IGF1R is a marker of aggressiveness in subsets of preinvasive and invasive breast cancer. These results demonstrate the importance of probabilistic modeling that delves deeper into molecular data and aims to improve diagnostic classification, prognostic assessment, and treatment selection. A GMM classifier demonstrates potential use for clinical validation of markers and determination of target populations, particularly when availability of specimens for marker development is low.

Usage metrics

    Cancer Research



    Ref. manager