American Association for Cancer Research
Browse

Figure S3 from Building Practical Risk Prediction Models for Nasopharyngeal Carcinoma Screening with Patient Graph Analysis and Machine Learning

Download (860.82 kB)
journal contribution
posted on 2025-11-26, 13:25 authored by Anjun Chen, Roufeng Lu, Ruobing Han, Ran Huang, Guanjie Qin, Jian Wen, Qinghua Li, Zhiyong Zhang, Wei Jiang
<p>Figure S3: Example topology of NPC patients resulting from searching 5 lab test factors. Blue nodes represent patients and yellow nodes represent lab test factors. This topology was generated using Cypher query #4 listed in Table S1. Each line connects a patient to a patient's factors. The topology shows how patients were distributed among a variety of lab factors.</p>

Funding

Guilin Science and Technology Bureau (桂林市科学技术局)

Department of Science and Technology of Sichuan Province (SPDST)

National Natural Science Foundation of China (NSFC)

Natural Science Foundation of Guangxi Province (Guangxi Natural Science Foundation)

History

ARTICLE ABSTRACT

To expand nasopharyngeal carcinoma (NPC) screening to larger populations, more practical NPC risk prediction models independent of Epstein–Barr virus (EBV) and other lab tests are necessary. Patient data before diagnosis of NPC were collected from hospital electronic medical records (EMR) and used to develop machine learning (ML) models for NPC risk prediction using XGBoost. NPC risk factor distributions were generated through connection delta ratio (CDR) analysis of patient graphs. By combining EMR-wide ML with patient graph analysis, the number of variables in these risk models was reduced, allowing for more practical NPC risk prediction ML models. Using data collected from 1,357 patients with NPC and 1,448 patients with control, an optimal set of 100 variables (ov100) was determined for building NPC risk prediction ML models that had, the following performance metrics: 0.93–0.96 recall, 0.80–0.92 precision, and 0.83–0.94 AUC. Aided by the analysis of top CDR-ranked risk factors, the models were further refined to contain only 20 practical variables (pv20), excluding EBV. The pv20 NPC risk XGBoost model achieved 0.79 recall, 0.94 precision, 0.96 specificity, and 0.87 AUC. This study demonstrated the feasibility of developing practical NPC risk prediction models using EMR-wide ML and patient graph CDR analysis, without requiring EBV data. These models could enable broader implementation of NPC risk evaluation and screening recommendations for larger populations in urban community health centers and rural clinics. These more practical NPC risk models could help increase NPC screening rate and identify more patients with early-stage NPC.

Usage metrics

    Cancer Epidemiology, Biomarkers & Prevention

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC