A Methodological Approach Using Sentiment Analysis of Online Medical Platforms As a Real-World Data Source of Patient Experiences

Speaker(s)

Cimino A1, Culbertson C1, Watkins E2, Li J3, Wangeshi S1
1Rogue Scholar Consulting, Baltimore, MD, USA, 2Organon, Jersey City, NJ, USA, 3Organon, Mason, OH, USA

OBJECTIVES: To describe an innovative methodology that leverages online reviews as a source of real-world data to understand disease state, patient preferences, and comparisons of products to treat medical conditions.

METHODS: Data from nine products to treat bacterial vaginosis (BV) were scraped from Drugs.com, WebMD.com, and Amazon.com. We (1) discuss ways to address pharmacovigilance and ethical concerns; (2) summarize how the data were collected, processed, and cleaned; (3) define the tokenization and determination of narrative segments; (4) describe the five lexicon-based algorithms used for sentiment analysis (i.e., sentimentr, affin, bing, syuzhet, NRC); (5) explain the quantitative analyses conducted with five-star ratings, user attributes, and sentiment data, and (6) illustrate the qualitative analytic approach, including inductive and query-focused coding.

RESULTS: Across all products, 3,891 reviews were collected for analysis (245 reviews were ultimately excluded for ineligibility). A relational SQL database was used to store and retrieve the data for analysis in R. Products included five Food and Drug Administration guideline recommended drugs and four over-the-counter supplements. The scraped information included the product name, formulary, route of administration, user/reviewer attributes, 5-star ratings, and free text review data. All sentiment scores and 5-star ratings were significantly positively correlated. Visualizations, univariate summaries, and bivariate comparisons depicted patient-preferred products. Sentiment analysis scores and scatterplots revealed patient likes and dislikes regarding medication effectiveness, use, adherence, product characteristics, side effects, and value. Qualitative data included themes on the disease state, its impact on relationships, and patient interactions with healthcare providers.

CONCLUSIONS: Online reviews of products used to treat medical conditions are a rich source of real-world data. Analyzing these data is a novel alternative to patient interviews and focus groups. The methods described here have broad application across diseases, new and emerging therapeutic areas, and for outcomes research evidence generation.

Code

RWD119

Topic

Methodological & Statistical Research, Real World Data & Information Systems

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Data Protection, Integrity, & Quality Assurance

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Reproductive & Sexual Health