Machine Learning Can Facilitate More Efficient Health Economic Literature Synthesis by Accurately Extracting Economic Data from Published Abstracts
Author(s)
Islam K1, Chow T1, Wang S1, Michelson M1, Pashos C2
1Genesis Research, Hoboken, NJ, USA, 2Genesis Research, Winchester, MA, USA
OBJECTIVES Although machine learning (ML) has been demonstrated to extract clinical outcomes for literature synthesis, its use to extract economic data has not been shown. We therefore tested whether ML could extract economic costs for possible application in health economic modeling. METHODS We modified an existing ML platform that typically extracts clinical outcomes from the medical literature to instead extract economic results by refocusing its attention to economic variables. To test our approach, three people labeled the same 95 rows of extractions as “perfect,” “almost perfect” (containing/missing non-key word(s)), “wrong,” or “missing.” Each extraction row contained a column for the economic value, currency unit, patient group, and economic measurement (e.g., “annual cost”). While the Kappa agreement amongst the labelers was fair (max = .35, avg = .30), when focusing on just the clearer “wrong” and “perfect” labels, Kappa increased substantially (max = 0.60, avg = 0.49) indicating the labeling challenge. We defined the “true” label as the majority’s. If no majority existed, we selected the most conservative (e.g., preferring wrong). We then computed the estimated recall (percentage found correctly), precision (e.g., accuracy), and perfect precision (accuracy only considering perfect extractions as correct). RESULTS For the value column, precision was 97.8% (perfect precision 96.0%). For the unit column, precision was 79.5% (perfect precision 56.4%). For the group column, precision was 97.8%, (perfect precision 90.3%), and for the measurement column precision was 98.9% (perfect precision 84.9%.) Estimated recall was 100% in all columns except unit, which was 43.3%. An error analysis for the unit column determined that in nearly all cases, the model misplaced the unit in the measurement column (e.g., “total cost, $” instead of placing “$” into unit), which was easily corrected. CONCLUSIONS This paper demonstrated that ML can accurately extract economic data, potentially improving the efficiency of HEOR research.
Conference/Value in Health Info
2021-11, ISPOR Europe 2021, Copenhagen, Denmark
Value in Health, Volume 24, Issue 12, S2 (December 2021)
Code
POSB113
Topic
Economic Evaluation, Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Cost-comparison, Effectiveness, Utility, Benefit Analysis
Disease
No Specific Disease