Landscape of Natural Language Processing (NLP) Capabilities at Clinical Sites: Insights From a Real-World (RW) Gastric Cancer (GC) Study

Author(s)

Spencer Jones, PhD¹, Lucy Turner, BSc², Aisha Rashid, BSc², Julia Gallinaro, PhD³, Marina Borges, MSc⁴, Karina Vitanova, PhD³, Elizabeth Eldridge, MPH⁵, Merce Conill, MSc⁶, Valeria Saglimbene, PhD⁷, Ines Guerra, MSc³.
¹AstraZeneca, Zürich, Switzerland, ²AstraZeneca, Baar, Switzerland, ³IQVIA, London, United Kingdom, ⁴IQVIA, Oeiras, Portugal, ⁵IQVIA, Durham, NC, USA, ⁶IQVIA, Barcelona, Spain, ⁷IQVIA, Milan, Italy.

Presentation Documents

ISPOREurope2025_Jones_RWD113_POSTER.pdf

OBJECTIVES: Medical notes contain valuable clinical information, yet they are often underutilized in RW evidence generation due to cost and complexity of manual curation. NLP offers solutions for information extraction; however its adoption across clinical sites is unclear. The study aimed to assess the current landscape of NLP capabilities for research purposes across sites participating in a RW GC study.
METHODS: A feasibility questionnaire (FQ) was developed to capture information on sites’ NLP capabilities, including technical details of NLP (e.g. type of model, validation procedure and metrics), regulatory compliance and quality assurance processes in place. The FQ was sent to 27 sites, across six countries. Follow-up interviews were conducted to clarify responses. Participating sites were selected based on their expertise in GC treatment, with many sites belonging to the Oncology Evidence Network.
RESULTS: Of the 17 responding sites, nine reported having NLP capabilities (two in France, one in Italy, one in Germany, two in the United Kingdom, one in Switzerland and two in Canada). Among these, four sites had already extracted study-relevant variables using NLP (sites in France, Italy and Germany). Three of the four also indicated capacity to extract additional variables. Two sites from the United Kingdom had prior NLP experience but lacked reusable algorithms. One Canadian site had piloted NLP internally; the other Canadian site and the Swiss site provided limited details. The NLP approaches used varied, including rule-, machine learning- and deep learning-based algorithms developed or fine-tuned in-house or commercially available software. Of the four sites with NLP-derived variables, three had data quality assurance processes, and two confirmed regulatory compliance (e.g. General Data Protection Regulation).
CONCLUSIONS: NLP adoption for variable extraction in clinical settings remains limited. While half of responding sites have explored NLP internally or in past studies, only a subset have validated, reusable algorithms readily available for research purposes.

Conference/Value in Health Info

2025-11, ISPOR Europe 2025, Glasgow, Scotland

Value in Health, Volume 28, Issue S2

Code

RWD113

Topic

Methodological & Statistical Research, Real World Data & Information Systems, Study Approaches

Topic Subcategory

Health & Insurance Records Systems

Disease

No Additional Disease & Conditions/Specialized Treatment Areas

Presentation (CTI)