Performance of Specific Custom AI Models for German Amnog Process Questions

Author(s)

Minartz C¹, Neubauer DA²
¹Institute for Health- and Pharmacoeconomics (IfGPh), Muenchen, Germany, ²Institute for Health- and Pharmacoeconomics (IfGPh), München, BY, Germany

Presentation Documents

2024-10-20_ISPOR AI Neubauer_v5142140.pdf

OBJECTIVES: Artificial Intelligence (AI), specifically advanced language models such as ChatGPT, have the potential to revolutionize various aspects of healthcare. The German AMNOG assessment process is quite data intensive and all key documents are available online. Therefore the question was investigated, if customized AI can support specific process-related questions currently considered expert knowledge.

METHODS: CustomGPT.ai is an AI model, which allows fast and easy setup based on website data and various formats such as PDF. For evaluation, 3 datasets were investigated: 1) procedural documents on AMNOG process methology, 2) G-BA resolution documents for the last 2 years (2022+) and 3) all G-BA published documents for all assessments in ophthalmology. AI custom persona was adapted to focus on accuracy and give rigorous source citations. Multiple test queries were executed for all 3 custom bots to assess response quality. All queries were performed with documents in German language.

RESULTS: Responses to process-related questions (bot 1) were mostly accurate and included the relevant sources. In contrast, for all G-BA resolutions (bot 2) even for specific simple queries like on added benefit level for a specific drug or endpoint results, no responses could be made. By modifying custom persona confabulation could mostly be avoided and sources were cited correctly. Still, simple analytic queries like counting certain events were not performed adequately. With the disease area specific bot in ophthalmology with access to all documents (bot 3) most specific queries could be answered to some degree. Still for numeric specific questions -with clear answers existing in the available documents- in most cases no answer was given or answers were not fully related to questions. However, in such cases mostly the correct references were cited and linked.

CONCLUSIONS: Tested custom AI models were very precise for specific responses from texts. More complex and especially numeric analytic tasks performed suboptimal.

Conference/Value in Health Info

2024-11, ISPOR Europe 2024, Barcelona, Spain

Value in Health, Volume 27, Issue 12, S2 (December 2024)

Code

HTA250

Topic

Health Technology Assessment, Methodological & Statistical Research

Topic Subcategory

Artificial Intelligence, Machine Learning, Predictive Analytics, Systems & Structure, Value Frameworks & Dossier Format

Disease

No Additional Disease & Conditions/Specialized Treatment Areas, Sensory System Disorders (Ear, Eye, Dental, Skin)

Explore Related HEOR by Topic

Presentation