Categorizing Telemedicine Visits Using Natural Language Processing and Machine Learning
Sudaria T1, Overcash J2, Nguyen N2, Oguntuga A3
1Veradigm, San Francisco, CA, USA, 2Veradigm, Raleigh, NC, USA, 3Accenture, Houston, TX, USA
OBJECTIVES: Telemedicine visits increased recently due to the COVID-19 pandemic, however a discreet telemedicine indicator is not present in some Electronic Health Record (EHR) data. Natural Language Processing (NLP) and Machine Learning (ML) were to build a model to categorize patient visits to better understand telemedicine utilization.
METHODS: Initially, encounter type, note type, chief complaint, and appointment type were features used to categorize 389,315,647 visits spanning the last 14 years in an ambulatory EHR dataset. Each feature was filtered based on a list of 21 inclusion and 29 exclusion words or word chunks, as well as 7 CPT codes, 23 SNOMED codes, and 9 HCPCS codes. A clinician tagged each feature as indicating telemedicine or not. A predictive ML model was trained. Data was preprocessed by removing identifying features and punctuation, spelling correction, flagging negated words, and lemmatizing. Each feature was converted into unigrams, bigrams, and trigrams, and transformed with a TFIDF-vectorizer. The model was fit on a XGBoost ML model. The model tagged each feature as either 0 (not telemedicine) or 1 (telemedicine). Visits had multiple features that were conflicting. To determine if the whole visit was telemedicine or not, visit tie-breaking features were added if documented: vitals, labs, and medications prescribed. A rules-based model was created and applied to categorize the whole visit as telemedicine, not telemedicine, or not enough information. The visits were then re-evaluated by clinicians to determine overall model fitness.
RESULTS: In total, the model had an accuracy of 99.86% with an F1 score of 96.99%. There were 577% (3,450,561) more telemedicine visits since the start of the COVID-19 pandemic than the sum of all previous telemedicine visits combined (597,410).
CONCLUSIONS: A mixture of ML and rules-based methods successfully categorize visits as telemedicine visits. Further studies stratifying on telemedicine and non-telemedicine visits can now be done.
Conference/Value in Health Info
No Additional Disease & Conditions/Specialized Treatment Areas