EXTRACTING DOSAGE PER DAY FROM FREE-TEXT MEDICATION PRESCRIPTIONS

Author(s)

Törnblom M1, Bergman G2, Jørgensen L3, Fackle-Fornius E1, Rosenlund M2
1Stockholm University, Stockholm, Sweden, 2IMS Health, Solna, Sweden, 3Pygargus/IMS Health, Solna, Sweden

OBJECTIVES: The Swedish prescribed drug register contains dose instructions as written by the physician. A challenge is to convert the text into a number of doses per day which can be used to calculate for example duration of treatment. The objective of this study is to compare algorithms for named entity recognition to extract dosage per day. METHODS: Two sequence models, Hidden Markov Model (HMM) and Conditional Random Fields (CRF), were used to predict label sequences. The HMM and CRF were compared using different measures of prediction: precision, recall, F-score and accuracy. We also evaluated how prediction was effected by including more labels and features; for CRF models we used 12 labels for both models with 2 and 11 feature types respectively, for HMM models we used 12, 15 and 18 labels respectively. Using the predicted labels, a rule-based algorithm was used to predict dosage per day. Prediction of dosage per day was evaluated using accuracy. RESULTS: Label prediction: As expected, increasing the number of labels/features increased the F-score. The CRF model with 11 feature types had a F-score of 0.989 compared to 0.972 using two feature types. The HMM model with 15 and 18 labels both achieved a F-score of 0.986 compared to 0.966 using 12 labels. In terms of precision and recall the performance of the CRF and HMM varied. Dosage prediction: The CRF model with 11 feature types achieved 97.2% accuracy. The HMM with 15 labels achieved a higher accuracy than with 18 labels (95.7% versus 95.5%). CONCLUSIONS: The CRF has the highest accuracy in label and dosage per day prediction. The HMM model also has comparably high accuracy but is generally lower than the CRF. We recommend CRF over HMM for named entity recognition on prescription text; it is time efficient and predicts dosage per day with high accuracy.

Conference/Value in Health Info

2016-10, ISPOR Europe 2016, Vienna, Austria

Value in Health, Vol. 19, No. 7 (November 2016)

Code

PRM198

Topic

Methodological & Statistical Research

Topic Subcategory

Confounding, Selection Bias Correction, Causal Inference

Disease

Multiple Diseases

Explore Related HEOR by Topic


Your browser is out-of-date

ISPOR recommends that you update your browser for more security, speed and the best experience on ispor.org. Update my browser now

×