THE APPLICATION OF NATURAL LANGUAGE PROCESSING (NLP) TECHNOLOGY TO ENRICH ELECTRONIC MEDICAL RECORDS (EMRS) FOR OUTCOMES RESEARCH IN ONCOLOGY

Author(s)

Hirst C1, Hill J2, Khosla S1, Schweikert K2, Senerchia C2, Kitzmann K2, Zhang Q1
1AstraZeneca, Macclesfield, UK, 2Humedica, Boston, MA, USA

OBJECTIVES: Many studies which use EMRs to evaluate oncology patients and practises have caveats around partial/missing observations within patient records. We describe an approach to build a potentially richer oncology dataset, supplementing EMR with case note observations through the use of NLP, applied specifically for the capture of molecular data. METHODS: NLP concepts are identified and created based on broad topics such as medications, signs, disease and symptoms, measurements and observations. The data is harvested from the notes fields within the deidentified EMRs (including inpatient, clinics, pathological etc.) provided to Humedica from over 25 large health care systems throughout the United States. Each NLP concept included in the data is associated with a unique subject record and a date of observation; allowing longitudinal tracking of concepts such as a molecular entities. Data from NLP are linked to patient EMR records to allow inclusion of the additional variables in further analyses. The method was applied to identify molecular testing data in a specific cancer type. RESULTS: Of the 18,068 included patients with valid clinical notes for interrogation, patient notes for 1,027 were observed to have a defined observation of a molecular test specific for the target of interest; 46.3% (475) of which were deemed positive (i.e. indicating presence of the molecular target); 41.5% (426) negative; and 12.3% (126) with unknown status. CONCLUSIONS: Innovative algorithms, technical skills and clinical knowledge are required in the generation and analysis of oncology disease data, and NLP can allow enrichment with variables which are not included in EMR, allowing more detailed understanding of patient cohorts. We have described an approach deemed to be successful in identifying cohorts of oncology patients with researchable molecular characteristics. Further correlating evidence and cross validation will determine the robustness and representativeness of the data generated with this approach.

Conference/Value in Health Info

2014-05, ISPOR 2014, Palais des Congres de Montreal

Value in Health, Vol. 17, No. 3 (May 2014)

Code

DB4

Topic

Real World Data & Information Systems, Study Approaches

Topic Subcategory

Reproducibility & Replicability

Disease

Oncology

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×