Evaluating Large Language Models’ Performance in Content Generation for Literature Review Reports

Speaker(s)

Finn E1, Rtveladze K2, Guerra I1, Leite J3, Caverly S1, Shah V1, Gallinaro J1, Matev K4, Lambova A4
1IQVIA, London, LON, UK, 2IQVIA, London , LON, UK, 3IQVIA, Oeiras, Portugal, 4IQVIA, Sofia, Bulgaria

OBJECTIVES: Large Language Models (LLMs) have the capability to revolutionize content generation, potentially attaining a degree of expertise that could parallel human proficiency. This study investigates the practicality and effectiveness of LLMs in generating content for Literature Review (LR) reports.

METHODS: Subject Matter Experts (SMEs) in LRs used a LLM pipeline and prompt engineering to generate content, covering diverse indications with varying complexities of economic analysis. Performance was evaluated across several dimensions, including relevance, completeness, accuracy, language quality, and overall quality, using a five-point Likert Scale; ‘strongly agree’, ‘agree’, ‘neutral’, ‘disagree’ and ‘strongly disagree’. Additionally, SMEs estimated the time saved when using the LLM, in comparison to manual content generation.

RESULTS: SMEs agreed or strongly agreed that all generated content was relevant to the topic under consideration. SMEs unanimously agreed that responses lacked some details, suggesting the need for more detailed LLM prompts to elicit more comprehensive responses. While the responses were largely accurate (90% strongly agreed oragreed), there were isolated cases where incorrect numberwere retrieved. It was universally agreed that responses were well-written, however they were not free of hallucinations despite the inclusion of context in the prompts, reinforcing the need for human input for careful review and validation when using the LLM-generated content. Overall, all SMEs agreed that they would incorporate the generated responses into their deliverables (90% strongly agreed oragreed ) and expected considerable effort savings compared to manual content generation.

CONCLUSIONS: This study demonstrates that LLMs can significantly aid in content generation for LRs and offer valuable time savings. However, their performance is contingent on the complexity of the information and the level of detail provided in the prompts. SME review is essential to ensure accuracy and completeness of the outputted information.

Code

MSR12

Topic

Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics

Disease

No Additional Disease & Conditions/Specialized Treatment Areas