Linguists’ Perceptions of Human- and Machine-Translated Clinical Outcome Assessment (COA) Wording: A Mixed Methods Study
Author(s)
Delgaram-Nejad O1, Poepsel T2, Ramsey P3, Hadjidemetriou C3, Israel R4, Nolde A5, Browning R6, McKown S4, McCullough E7
1RWS Life Sciences, Dawlish, DEV, UK, 2Corporate Translations, Inc., East Hartford, CT, USA, 3RWS Life Sciences, Croydon, LON, UK, 4RWS Life Sciences, East Hartford, CT, USA, 5RWS Life Sciences, Chicago, IL, USA, 6RWS Life Sciences, Bloxham, OXF, UK, 7RWS Life Sciences, Boston, MA, USA
Presentation Documents
OBJECTIVES: The role of machine translation (MT) in linguistic validation (LV) is an emerging discussion (Vanmassenhove et al., 2019). Available evidence suggests that MT applications are unsuited to COA translation contexts, as discussed in our prior qualitative survey research with linguists. Here we extend this work by examining linguists’ perceptions of MT after completing an MT detection task.
METHODS: Participants reviewed human translated (HT; using LV methodology) and MT phrases outside of instrument context, based on low-complexity, culturally-neutral source English COA phrases. These were randomized, balanced, and length-matched. Participants were also given text prompts: ‘explain your choices’ and ‘list any relevant linguistic / cultural factors’ and follow-up questions about ‘the role of AI in LV’ and ‘performance in the experiment.’
RESULTS: In the post-experiment, participants expected fluency and naturalness to signal HT, while over-literal translations, lack of idiomaticity, and technical (e.g., grammatical) errors to signal MT . Yet participants’ (n=401, 10 languages) ability to distinguish MT from HT experimentally was variable and mixed (43-58%, SD:15-18%), although accuracy increased for longer phrases: suggesting that accuracy depends on source-related factors. Follow-up responses emphasized the importance of human oversight of MT applications (58% of n = 176).
CONCLUSIONS: Qualitative feedback from linguists identified many linguistic factors that may distinguish MT and HT, while experimental task performance showed variable success in distinguishing low-complexity, culturally neutral MT and HT phrases. Higher success in some cases, and overall performance variability, signal that phrase content, length, and language identity may impact distinguishability. Linguists’ assumption that technical errors signal MT may have caused over-literal HT to be mistaken for MT, and index both underlying distrust of MT and variable MT quality across languages. Further work with full instruments and more culturally specific COA content is planned. These findings underscore the need for caution with machine COA translation use cases.
Conference/Value in Health Info
Value in Health, Volume 27, Issue 12, S2 (December 2024)
Code
CO32
Topic
Clinical Outcomes
Topic Subcategory
Clinical Outcomes Assessment
Disease
No Additional Disease & Conditions/Specialized Treatment Areas