Breaking Linguistic Barriers: Cross-Language HEOR & Evidence Integration Through GenAI
Author(s)
Barinder Singh, RPh1, Vedant Soni, B.Tech2, Ritesh Dubey, PharmD2, Mrinal Mayank, BE2, Gagandeep Kaur, M.Pharm2, Shubhram Pandey, MSc2, Rajdeep Kaur, PhD2.
1Pharmacoevidence, London, United Kingdom, 2Pharmacoevidence, Mohali, India.
1Pharmacoevidence, London, United Kingdom, 2Pharmacoevidence, Mohali, India.
Presentation Documents
OBJECTIVES: In Systematic Literature Reviews (SLR), researcher encounter articles in non-English languages other than English. These are articles are generally excluded from the analysis process or are translated into English using the online tools or manual translations which can be error prone or time consuming. The objective of this study was to develop a framework using Large Language Model (LLM) and Retrieval Augmented Generation (RAG) to facilitate the review in multiple languages.
METHODS: An AI-powered interface was developed with Claude Sonnet 3.5 and RAG pipeline to support the cross-language translation. This data pipeline allows upload and process research articles in multiple languages like Chinese, German, Japanese, French, etc. Research articles focusing on safety and efficacy of Breast Cancer (BC) in different languages were uploaded in the interface. The dynamic RAG pipeline divided these articles into small chunks and created the embeddings to facilitate the efficient retrieval. An AI agent was employed using LLM (Claude sonnet 3.5) to translate these articles into English, ensuring that translation maintained the contextual accuracy. Domain experts validated the Key metrics retrieved from these articles.
RESULTS: The RAG-based interface efficiently processed and translated articles published in four different languages into English. Domain experts unanimously agreed on the accuracy of the translations. Additionally, the experts used prompts on the interface to automatically extract data from the translated non-English articles. The extracted data was cross-verified with the source documents translated by alternative methods, and domain experts confirmed the accuracy of the extractions.
CONCLUSIONS: The RAG-based interface demonstrated high efficiency and accuracy in translating articles from different languages into English and extracting relevant data. These findings highlight the capabilities of GenAI solutions in overcoming language barriers in clinical research, enabling the inclusion of non-English articles for more precise evidence analysis.
METHODS: An AI-powered interface was developed with Claude Sonnet 3.5 and RAG pipeline to support the cross-language translation. This data pipeline allows upload and process research articles in multiple languages like Chinese, German, Japanese, French, etc. Research articles focusing on safety and efficacy of Breast Cancer (BC) in different languages were uploaded in the interface. The dynamic RAG pipeline divided these articles into small chunks and created the embeddings to facilitate the efficient retrieval. An AI agent was employed using LLM (Claude sonnet 3.5) to translate these articles into English, ensuring that translation maintained the contextual accuracy. Domain experts validated the Key metrics retrieved from these articles.
RESULTS: The RAG-based interface efficiently processed and translated articles published in four different languages into English. Domain experts unanimously agreed on the accuracy of the translations. Additionally, the experts used prompts on the interface to automatically extract data from the translated non-English articles. The extracted data was cross-verified with the source documents translated by alternative methods, and domain experts confirmed the accuracy of the extractions.
CONCLUSIONS: The RAG-based interface demonstrated high efficiency and accuracy in translating articles from different languages into English and extracting relevant data. These findings highlight the capabilities of GenAI solutions in overcoming language barriers in clinical research, enabling the inclusion of non-English articles for more precise evidence analysis.
Conference/Value in Health Info
2025-05, ISPOR 2025, Montréal, Quebec, CA
Value in Health, Volume 28, Issue S1
Code
MSR106
Topic
Methodological & Statistical Research
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics
Disease
No Additional Disease & Conditions/Specialized Treatment Areas