MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING TO POWER LITERATURE SEARCH FOR TIMELY, MEANINGFUL RESULTS
Author(s)
Wagner S1, Kuiper J2, Feinman T2, Shalev E2, Whittington C2, del Aguila MA2
1Bristol-Myers Squibb, Princeton, NJ, USA, 2Doctor Evidence, Santa Monica, CA, USA
OBJECTIVES: The rate of increase for medical publications is exponential and can overwhelm researchers, health practitioners, patients and policy makers. To separate signal from noise, we engineered natural language processing and machine learning pipelines to read and process information in near real-time. We illustrate with a real example when relevant information was needed for a presentation regarding analytic considerations in oncology trials applicable to regulatory and market access activities. METHODS: State-of-the-art machine learning (including GPU enabled Deep Learning) performed syntactical and semantic parsing of medical language to annotate MEDLINE, ClinicalTrials.gov and 80+ RSS news feeds, totaling over 29 million unique articles. Metadata mined from ontologies like MEDDRA, RxNorm, SNOMED, MESH, and other proprietary data sources provided billions of semantic relations across all sources. This natural language tool searched for articles to obtain relevant publications on violation of non-proportional hazards (NPH) assumptions in clinical trials, and how sponsors addressed the issue in regulatory and HTA submissions. RESULTS: The system indexes over a million concepts from the various ontologies and proprietary sources, manually curated to provide the most complete thesaurus for medical terms. Machine learning pipelines based on Deep Learning, Semantic Web and Information Retrieval created annotations based on word syntax & semantics, concept information, and sentence context. Transitive resolution of article links identified all articles associated with the same underlying trial. For the search on NPH, DOC search within minutes delivered a number of relevant articles that could be used without further refinement to craft a presentation. This saved several hours of search time on other databases. CONCLUSIONS: Machine learning is a rapidly developing field. Although the tool occasionally misses concepts or incorrectly labels terms, this example showed that DOC Search provided annotations faster and more reliably than could previously been achieved. This can provide significant value to researchers.
Conference/Value in Health Info
2018-11, ISPOR Europe 2018, Barcelona, Spain
Value in Health, Vol. 21, S3 (October 2018)
Code
PRM85
Topic
Real World Data & Information Systems
Topic Subcategory
Reproducibility & Replicability
Disease
Multiple Diseases