American Association for Cancer Research
Browse

TABLE 3 from Artificial Intelligence–Assisted Cancer Status Detection in Radiology Reports

Download (5.5 kB)
dataset
posted on 2024-04-09, 15:20 authored by Ankur Arya, Andrew Niederhausern, Nadia Bahadur, Neil J. Shah, Chelsea Nichols, Avijit Chatterjee, John Philip

Classification metric of the multi-class classifier model to predict presence of cancer using one versus rest approach on held out test data

Funding

HHS | National Institutes of Health (NIH)

History

ARTICLE ABSTRACT

Cancer research is dependent on accurate and relevant information of patient's medical journey. Data in radiology reports are of extreme value but lack consistent structure for direct use in analytics. At Memorial Sloan Kettering Cancer Center (MSKCC), the radiology reports are curated using gold-standard approach of using human annotators. However, the manual process of curating large volume of retrospective data slows the pace of cancer research. Manual curation process is sensitive to volume of reports, number of data elements and nature of reports and demand appropriate skillset. In this work, we explore state of the art methods in artificial intelligence (AI) and implement end-to-end pipeline for fast and accurate annotation of radiology reports. Language models (LM) are trained using curated data by approaching curation as multiclass or multilabel classification problem. The classification tasks are to predict multiple imaging scan sites, presence of cancer and cancer status from the reports. The trained natural language processing (NLP) model classifiers achieve high weighted F1 score and accuracy. We propose and demonstrate the use of these models to assist in the manual curation process which results in higher accuracy and F1 score with lesser time and cost, thus improving efforts of cancer research. Extraction of structured data in radiology for cancer research with manual process is laborious. Using AI for extraction of data elements is achieved using NLP models’ assistance is faster and more accurate.

Usage metrics

    Cancer Research Communications

    Licence

    Exports

    RefWorks
    BibTeX
    Ref. manager
    Endnote
    DataCite
    NLM
    DC