From Code to Quality: Identifying Quality Assurance Steps for AI Integration in Qualitative Healthcare Research: A Comparative Study of ChatGPT, Atlas.ti, and Traditional Coding

Author(s)

Chris Buckley, MSc¹, Mahak Jain, MSc², Rinchen Doma, BA, BS, MPH³, Kavita Jarodia, PhD⁴, Angela Stroupe, MSc⁵, Naomi Suminski, MS⁶, Alessandra Girardi, PhD⁷.
¹Parexel, London, United Kingdom, ²Parexel, Mumbai, India, ³Parexel, Durham, NC, USA, ⁴Parexel, Panchkula, India, ⁵Parexel, Boston, MA, USA, ⁶Parexel, San Diego, CA, USA, ⁷Parexel, Milan, Italy.

OBJECTIVES: Artificial Intelligence (AI) improves efficiency, but outputs in healthcare research remain uncertain. While patient-reported data may benefit from AI, careful examination is needed to assess result quality. This preliminary study compared the qualitative outputs produced by ChatGPT (generative AI utilising natural-language-processing) and Atlas.ti-AI (qualitative data-analysis software used for coding qualitative data) with traditional human-led coding on four mock interviews exploring what human-input is needed to ensure qualitative research integrity.
METHODS: Codebooks were developed by AIs and human experts, independently, followed by content analysis and reporting. Coder reliability was calculated to ascertain agreement between AI and researchers. Bespoke items were compared between AIs and the human’s codebook. An independent researcher scored items as Low, Medium or High, with a maximum score of 12 (codebook) and 30 (report).
RESULTS: ChatGPT completed analysis in 12 minutes, whereas Atlas.ti-AI consumed 2 hours. ChatGPT outperformed Atlas.ti-AI in codebook development, scoring 11/12 overall with high marks in comprehensiveness, quality and structure. Atlas.ti-AI’s codebook lacked nuance (7/12), potentially requiring substantial human input. It also failed to execute required analytical functions for the content analysis. In contrast, ChatGPT showed promise, with performance like human experts in concept coding (ICR=0.90). ChatGPTs content analysis scored 23/30 overall, with high scores in quality (8/9) and relevance (9/12), but lower in richness (6/9), demonstrating coherence, clarity and flow, but possibly insufficient in-depth quotes and comprehensiveness.
CONCLUSIONS: ChatGPT is a valuable tool offering rapid analysis and coding consistent with human-led analysis, but potentially limited to fully convey a patient’s experience. The integration capabilities of Atlas.ti-AI show promise but not fully realized in this experiment due to functional limitations. Results highlight substantial efficiency gains, but underscore the need for quality checks to ensure the patients’ journey is fully represented. Standardized validation procedures and recommendations need to be established to ensure robust AI results in qualitative healthcare research.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

PT29

Topic

Clinical Outcomes, Methodological & Statistical Research, Patient-Centered Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)