Leveraging Machine Learning To Understand Pet Owner Experiences of Feline Pruritus Through Social Media Listening
Author(s)
Cherry G1, Mpantis A2, Rai T3, Wright A4, Brown R2, Wells K5
1University of Surrey, Guildford, SRY, UK, 2Athens Technology Center (ATC), Athens, Greece, 3University of Surrey, London, LON, UK, 4Zoetis, Babcock Ranch, FL, USA, 5University of Surrey, Guildford, Surrey, UK
Presentation Documents
OBJECTIVES: Feline pruritus is a common disease in the domestic cat with easily observable symptoms, yet remains poorly understood regarding its impact on pet owners' lives and quality of life for affected animals. Traditional research methods, including surveys and interviews, are resource-intensive and necessitate access to representative cohorts, limiting their feasibility. This study harnessed Social Media Listening (SML) to collect pertinent conversations on social media sites. This data is freely available, public domain and without question bias.
METHODS: Keywords, content sources and topics were selected by clinical veterinary dermatology experts and augmented by research literature. Data was collected using ATC’s social intelligence platform. Extracting high quality data using SML required well defined relevance criteria with posts manually labelled relevant or irrelevant. A dataset comprising 5,000 labelled real-world posts and 3,800 synthetic posts (to mitigate data scarcity) was split into 7,000 training and 1,800 test posts for machine learning. Synthetic data was generated by language models like OpenAI GPT using labelled data considered unsuitable for training due length of post. Data cleaning was applied to Twitter and Reddit posts (real and synthetic) to remove posts of less than seven words before lemmatization and spelling correction. Posts >500 words were summarised using a GPT language model. Entities (synonyms and similar words) were extracted using cosine similarity.
RESULTS: A fine-tuned variant of the BERT uncased model, trained on case-specific data over ten epochs, for relevance detection, yielded an F1 score of 0.8026, sensitivity of 0.8838, and precision of 0.7350.
CONCLUSIONS: This study highlights the benefits of employing machine learning in relevance detection, reducing human error and marker fatigue while accelerating data analysis scalability. This innovative approach, on manually labelled posts, offers promising insights into feline pruritus and its ramifications on feline and pet owner well-being, potentially validating health-related quality of life measures.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 6, S1 (June 2024)
Code
MSR5
Topic
Epidemiology & Public Health, Methodological & Statistical Research, Patient-Centered Research, Real World Data & Information Systems
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Data Protection, Integrity, & Quality Assurance, Patient-reported Outcomes & Quality of Life Outcomes
Disease
Sensory System Disorders (Ear, Eye, Dental, Skin), Skin (including hair loss) Diseases/Disorders, Veterinary Medicine