MACHINE LEARNING AND NATURAL LANGUAGE PROCESSING TO POWER LITERATURE SEARCH FOR TIMELY, MEANINGFUL RESULTS

Author(s)

Wagner S¹, Kuiper J², Feinman T², Shalev E², Whittington C², del Aguila MA²
¹Bristol-Myers Squibb, Princeton, NJ, USA, ²Doctor Evidence, Santa Monica, CA, USA

Presentation Documents

PRM85-wagner-s-sup-1-sup-kuiper-j-sup-2-sup-feinman-t-sup-2-sup-shalev-e-sup-2-sup-whittington-c-sup-2-sup-strong-u-del-aguila-ma-u-sup-2-sup-strong-br-sup-1-sup-bristol-myers-squibb-princeton-nj-usa-sup-2-sup-doctor-evidence-santa-monica-c ...

OBJECTIVES:

The rate of increase for medical publications is exponential and can overwhelm researchers, health practitioners, patients and policy makers. To separate signal from noise, we engineered natural language processing and machine learning pipelines to read and process information in near real-time. We illustrate with a real example when relevant information was needed for a presentation regarding analytic considerations in oncology trials applicable to regulatory and market access activities.

METHODS:

State-of-the-art machine learning (including GPU enabled Deep Learning) performed syntactical and semantic parsing of medical language to annotate MEDLINE, ClinicalTrials.gov and 80+ RSS news feeds, totaling over 29 million unique articles. Metadata mined from ontologies like MEDDRA, RxNorm, SNOMED, MESH, and other proprietary data sources provided billions of semantic relations across all sources. This natural language tool searched for articles to obtain relevant publications on violation of non-proportional hazards (NPH) assumptions in clinical trials, and how sponsors addressed the issue in regulatory and HTA submissions.

RESULTS:

The system indexes over a million concepts from the various ontologies and proprietary sources, manually curated to provide the most complete thesaurus for medical terms. Machine learning pipelines based on Deep Learning, Semantic Web and Information Retrieval created annotations based on word syntax & semantics, concept information, and sentence context. Transitive resolution of article links identified all articles associated with the same underlying trial. For the search on NPH, DOC search within minutes delivered a number of relevant articles that could be used without further refinement to craft a presentation. This saved several hours of search time on other databases.

CONCLUSIONS:

Machine learning is a rapidly developing field. Although the tool occasionally misses concepts or incorrectly labels terms, this example showed that DOC Search provided annotations faster and more reliably than could previously been achieved. This can provide significant value to researchers.

Conference/Value in Health Info

2018-11, ISPOR Europe 2018, Barcelona, Spain

Value in Health, Vol. 21, S3 (October 2018)

Code

PRM85

Topic

Real World Data & Information Systems

Topic Subcategory

Reproducibility & Replicability

Disease

Multiple Diseases

Explore Related HEOR by Topic

Real-World Data

Presentation