Cartography of Biostatistics and Machine Learning Methods to Identify Prognostic Factors
Author(s)
Mounier L, Civet A, Dupin J, Pau D, Esnault C
Roche, Boulogne-Billancourt, 92, France
Presentation Documents
OBJECTIVES:
With the emergence of machine learning (ML), new opportunities arise for prognostic factors identification. Although articles exist to review biostatistical methods for the identification of prognostic factors, the opportunities offered by ML algorithms are poorly considered. The overall purpose is to gather literature and cartography all methods in these two fields that are applicable to identify prognosis factors.METHODS:
A literature review based on relevant keywords has been performed on Google Scholar and PubMed. The criteria used for selecting methodological papers included the date of publication and number of citations. An iterative selection process was then conducted to make an in-depth search and identify new keywords, leading to more specific papers.RESULTS:
15 papers published after 2010 were selected from the literature to create a map covering feature extraction, feature selection and subgroup discovery fields. Feature selection methods include 3 families for independent features: 1/ Filter (e.g. univariate and multivariate analysis), 2/ Wrapper (based on e.g. sequential search, random search, exponential search), 3/ Embedded (lasso, ridge, elastic net, ...). Hybridization of these methods can also be implemented. Dedicated methods exist for structured features. Feature extraction includes methods that transform variables or dataset and must be associated with interpretative methods for prognostic factors identification purposes. Finally, subgroup discovery methods include many exploratory data mining techniques to uncover patterns associated with an outcome.CONCLUSIONS:
This research gives an overview of all existing approaches to identify prognostic factors, both in biostatistics and ML, and highlights that there is a great diversity of approaches.This cartography should help data experts to go beyond what is usually made in studies.Conference/Value in Health Info
2022-11, ISPOR Europe 2022, Vienna, Austria
Value in Health, Volume 25, Issue 12S (December 2022)
Code
MSR99
Topic
Methodological & Statistical Research, Study Approaches
Topic Subcategory
Artificial Intelligence, Machine Learning, Predictive Analytics, Literature Review & Synthesis
Disease
No Additional Disease & Conditions/Specialized Treatment Areas