METHODS FOR EXTRACTING TREATMENT PATTERNS FOR RENAL CELL CARCINOMA (RCC) FROM SOCIAL MEDIA (SM) FORUMS USING NATURAL LANGUAGE PROCESSING (NLP) AND MACHINE LEARNING (ML)
Author(s)
Merinopoulou E1, Ramagopalan S2, Malcolm B2, Cox A1
1Evidera, London, UK, 2Bristol-Myers Squibb, Uxbridge, UK
OBJECTIVES: Patients are increasingly turning to SM to research their condition and find support. Patient forums represent a potentially rich source of information on important matters to patients, including treatment. In this study we developed NLP and ML methods to extract and describe treatment histories for RCC patients. METHODS: We collected a corpus of 70,666 posts spanning fifteen years from 4 popular RCC specific forums. We used ML to identify phrases where patients/caregivers were describing treatment histories. Classification methods investigated were: Naïve Bayes, k-Nearest Neighbour and Support Vector Machine. Multiple direct and derived features were incorporated including bigrams and trigrams. The selected phrases were tagged with terms from the Unified Medical Language Thesaurus. Patterns of RCC therapies were extracted manually for a random sample of patients. RESULTS: The ML algorithm selected phrases where patients/caregivers recounted treatment histories with 85% overall accuracy (precision: 94%; recall: 82%; F-rate: 88%). After sub-setting, the corpus consisted of 13,740 phrases from 2,384 patients. 50 patients were then randomly selected for manual review, and compilation of treatment histories. Posting dates ranged from 2006-2016 and treatment sequence could be determined for 48 patients (96%).The most common first line therapy was sunitinib (58 %) followed by sorafenib and interleukin (both 13%).22 patients (46%) reported information on 2nd line; most common treatments were: pazopanib, everolimus , sorafenib (all 18%). Most common 3rd line was everolimus (3/7 patients). Alignment of these findings was seen with published data. CONCLUSIONS: This preliminary work showed that extracting treatment information from patient forums is challenging but technically feasible. Future work will focus on improving the accuracy of the ML algorithms, and extending automation to the assembly of treatment histories for all patients. If successful, SM can be used to describe real-world treatment sequences that can be especially useful where data options are limited.
Conference/Value in Health Info
2017-11, ISPOR Europe 2017, Glasgow, Scotland
Value in Health, Vol. 20, No. 9 (October 2017)
Code
RM3
Topic
Study Approaches
Disease
Urinary/Kidney Disorders