Machine Learning Can Facilitate More Efficient Health Economic Literature Synthesis by Accurately Extracting Economic Data from Published Abstracts

Author(s)

Islam K¹, Chow T¹, Wang S¹, Michelson M¹, Pashos C²
¹Genesis Research, Hoboken, NJ, USA, ²Genesis Research, Winchester, MA, USA

OBJECTIVES

Although machine learning (ML) has been demonstrated to extract clinical outcomes for literature synthesis, its use to extract economic data has not been shown. We therefore tested whether ML could extract economic costs for possible application in health economic modeling.

METHODS

We modified an existing ML platform that typically extracts clinical outcomes from the medical literature to instead extract economic results by refocusing its attention to economic variables.

To test our approach, three people labeled the same 95 rows of extractions as “perfect,” “almost perfect” (containing/missing non-key word(s)), “wrong,” or “missing.” Each extraction row contained a column for the economic value, currency unit, patient group, and economic measurement (e.g., “annual cost”). While the Kappa agreement amongst the labelers was fair (max = .35, avg = .30), when focusing on just the clearer “wrong” and “perfect” labels, Kappa increased substantially (max = 0.60, avg = 0.49) indicating the labeling challenge. We defined the “true” label as the majority’s. If no majority existed, we selected the most conservative (e.g., preferring wrong).

We then computed the estimated recall (percentage found correctly), precision (e.g., accuracy), and perfect precision (accuracy only considering perfect extractions as correct).

RESULTS

For the value column, precision was 97.8% (perfect precision 96.0%). For the unit column, precision was 79.5% (perfect precision 56.4%). For the group column, precision was 97.8%, (perfect precision 90.3%), and for the measurement column precision was 98.9% (perfect precision 84.9%.) Estimated recall was 100% in all columns except unit, which was 43.3%. An error analysis for the unit column determined that in nearly all cases, the model misplaced the unit in the measurement column (e.g., “total cost, $” instead of placing “$” into unit), which was easily corrected.

CONCLUSIONS

This paper demonstrated that ML can accurately extract economic data, potentially improving the efficiency of HEOR research.

Conference/Value in Health Info

2021-11, ISPOR Europe 2021, Copenhagen, Denmark

Value in Health, Volume 24, Issue 12, S2 (December 2021)

Code

POSB113

Topic

Economic Evaluation, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Cost-comparison, Effectiveness, Utility, Benefit Analysis

Disease

No Specific Disease

Explore Related HEOR by Topic

Presentation