Use of Natural Language Processing to Augment Real-World Data (RWD) and Identify Eligible Patients at Scale for Oncology Studies
Speaker(s)
Raju A1, Doko G1, Su Z2, Paulus J3, Robert N1
1Ontada, Boston, MA, USA, 2Ontada, Chestnut Hill, MA, USA, 3Ontada, Dedham, MA, USA
Presentation Documents
OBJECTIVES: Practical examples of the application of medical natural language processing (NLP) approaches are needed to assess their performance in generating fit-for-purpose real-world data (RWD) from unstructured data. We therefore evaluated the application of these technologies to support feasibility assessments of RWD studies and the implementation of complex inclusion and exclusion criteria to identify eligible patients for oncology studies.
METHODS: Pretrained healthcare NLP models were utilized to identify patients meeting complex clinical criteria. The primary data sources were large sets of clinical notes from iKnowMed™, an oncology-specific electronic health record (EHR) system. A randomly selected sample of the NLP results were validated by clinical abstractors for each study to ensure accuracy and reliability.
RESULTS: Twenty feasibility studies were conducted across various cancer types, including rectal cancer (n=4), lymphoma (n=4), and renal cell carcinoma (n=3), among others, with varying sample sizes. In 17 studies, NLP was used to expand the sample size from structured data counts, while in 3 studies, NLP was used to reduce the number of eligible patients necessary for manual chart abstraction. When the goal was to increase the sample size, there was an average 3.5-fold increase (range: 1.1-7.4). When the goal was to reduce the sample size, there was a mean reduction of 80% in the number of patients (range: 69-89%).
The NLP results demonstrated high validity. For example, in a study of patients with renal cell carcinoma, the NLP model identified 315 patients who underwent nephrectomy, and 295 (93%) of them were confirmed to meet the inclusion criteria through manual chart abstraction.CONCLUSIONS: The application of NLP effectively refined and augmented structured data to identify eligible patients for oncology studies across a range of cancer types. This offers a scalable solution for conducting RWD studies and supplements high-effort and costly activities like manual chart abstraction.
Code
MSR30
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
Oncology