Evaluation of the Effectiveness of a Naturalness Probe for Assessing Human- and Nonhuman-Generated COA Translations in Linguistic Validation Cognitive Debriefing Interviews
Author(s)
Tim Poepsel, MS, PhD1, Chryso Hadjidemetriou, PhD2, Rebecca Israel, MS3, Rachael Emily Browning, BA4.
1Survey Research Analyst Team Lead, RWS Life Sciences, East Hartford, CT, USA, 2RWS, Croydon, United Kingdom, 3RWS Life Sciences, East Hartford, CT, USA, 4RWS, Bloxham, United Kingdom.
1Survey Research Analyst Team Lead, RWS Life Sciences, East Hartford, CT, USA, 2RWS, Croydon, United Kingdom, 3RWS Life Sciences, East Hartford, CT, USA, 4RWS, Bloxham, United Kingdom.
OBJECTIVES: The Wild et al. (2005) linguistic validation guidance establishes the goals of cognitive debriefing (CD) as probing comprehensibility and cognitive equivalence of translations, testing translation alternatives, and flagging conceptually inappropriate items, but does not mandate approaches or standard probes for conducting interviews. Patients occasionally offer feedback that translations may be comprehensible and conceptually equivalent to source material without being natural or using patient-preferred wording; however, without standardized naturalness probing, this feedback is missed in an unknown proportion of CD interviews, allowing unnatural translations to go undetected or unresolved. Emerging interest in AI-assisted LV translation amplifies this concern, with pilot studies suggesting patients and quality raters can detect AI-generated translations via divergences from naturalness, preferring human-generated, natural-sounding translations. Accordingly, direct probing of naturalness may be essential for optimizing patient-centeredness and assessing quality of COA translations, whether human-generated or not.
METHODS: We pilot tested a novel naturalness probe in CD interviews assessing translations of a PRO in 21 languages. For each instrument sub-component (e.g., item; response option; instruction) patients rated translation naturalness (with unnaturalness defined as “language sounding ‘translated’ or ‘machine-generated’, using uncommon, awkward grammatical conventions, phrases or words).
RESULTS: Overall, naturalness probes flagged 50 translation components in 14/21 (67%) languages. In 38/50 cases, the same component was not flagged by common probes for comprehension or paraphrasing success. Translation updates resulted in 27/50 (54%) cases, and 20/27 (74%) of these changes were uniquely motivated by feedback to the naturalness probe.
CONCLUSIONS: Pilot testing of a novel naturalness probe was well-tolerated and understood by patients, and provided unique, actionable CD feedback leading to patient-centered translation improvements. We suggest standardization of naturalness probes in CD interviews, with data supporting their ability to consistently flag a dimension of translation quality not captured by standard probing approaches, with potential extension to quality assessment of non-human generated COA translations.
METHODS: We pilot tested a novel naturalness probe in CD interviews assessing translations of a PRO in 21 languages. For each instrument sub-component (e.g., item; response option; instruction) patients rated translation naturalness (with unnaturalness defined as “language sounding ‘translated’ or ‘machine-generated’, using uncommon, awkward grammatical conventions, phrases or words).
RESULTS: Overall, naturalness probes flagged 50 translation components in 14/21 (67%) languages. In 38/50 cases, the same component was not flagged by common probes for comprehension or paraphrasing success. Translation updates resulted in 27/50 (54%) cases, and 20/27 (74%) of these changes were uniquely motivated by feedback to the naturalness probe.
CONCLUSIONS: Pilot testing of a novel naturalness probe was well-tolerated and understood by patients, and provided unique, actionable CD feedback leading to patient-centered translation improvements. We suggest standardization of naturalness probes in CD interviews, with data supporting their ability to consistently flag a dimension of translation quality not captured by standard probing approaches, with potential extension to quality assessment of non-human generated COA translations.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
PCR86
Topic
Clinical Outcomes, Methodological & Statistical Research, Patient-Centered Research
Topic Subcategory
Instrument Development, Validation, & Translation, Patient-reported Outcomes & Quality of Life Outcomes
Disease
No Additional Disease & Conditions/Specialized Treatment Areas