WITHDRAWN: Structuring Published Clinical Trial Data With Natural Language Processing to Accelerate Research
Author(s)
ABSTRACT WITHDRAWN
OBJECTIVES: Most results from clinical trial data are in published research which is unstructured format. Conducting systematic reviews and even quicker targeted searches is a very expensive manual process today that involves reading through hundreds or thousands of abstracts and extracting information from them. We show how the application of natural language processing (NLP) extractors can help structure millions of published records and unlock new ways of analysing this data, which is today not possible due to the sheer volume of it.
METHODS: Using a combination of weak rules and machine learning (ML), we show how attributes such as study type or eligibility criteria can be extracted from text and saved in a structured format. Weak rules encode domain expertise in how this data is typically presented, and is an inexpensive way of building a large training dataset for NLP models. The result is a span detector which identifies phrases in abstracts referring to specific data points of interest, and normalised to a known ontology. This data can in turn be used for targeted searches, speeding up screening of systematic reviews, or even running predictive analytics models.
RESULTS: We show how the techniques described above were applied to a large sample of over 600,000 pubmed abstracts with a coverage of up to 90% on some of the attributes. The library with the functions used to extract the data has been made open-source.
CONCLUSIONS: Having an up-to-date single view of all clinical trial results at various stages of the lifecycle in structured format is the holy grail of medical research. This is made possible with the latest advances in ML, such as weak supervision and deep learning models. This new structured data of clinical trials can unlock higher-level analytics as well as research on personalised medicine.
Conference/Value in Health Info
Value in Health, Volume 25, Issue 12S (December 2022)
Code
MSR138
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Clinical Trials, Literature Review & Synthesis, Meta-Analysis & Indirect Comparisons
Disease
No Additional Disease & Conditions/Specialized Treatment Areas, STA: Personalized & Precision Medicine