EFFICIENT DATA MINING AND PROBABILISTIC INFERENCE WITH P-COURSE- A BAYESIAN METHOD WITH MULTILEVEL PRIORS FOR MEDICAL APPLICATIONS
Author(s)
Erkki JO Soini, Student(HE), RN, Researcher1, Janne A Martikainen, MSc, Research Director2, Jussi Lahtinen, BSc, Researcher3, Petri Myllymäki, Phd, Professor, Research Director3, Petri Kontkanen, MSc, Researcher3, Hannu Valtonen, Phd, Professor4, Olli-Pekka Ryynänen, Phd, Professor, docent51Department of Health Policy and Management, Department of Social Pharmacy, University of Kuopio, Kuopio, Finland; 2 Centre for Pharmaceutical Policy and Economics (CEPPE), Department of Social Pharmacy, University of Kuopio, Kuopio, Finland; 3 Complex Systems Computation Research Group (CoSCo), Helsinki Institute for Information Technology (HIIT), University of Helsinki, Helsinki, Finland; 4 Department of Health Policy and Management, University of Kuopio, Kuopio, Finland; 5 General Practice, Department of Public Health and Clinical Nutrition, University of Kuopio, Kuopio, Finland
OBJECTIVES: As observations, parameters, and models are uncertain, there exist several ways to explain data with the parameters and models. Of all the plausible explanations, the simplest can be considered best, yielding the best predictions (the "Occam's razor" principle). Relevant parameters are needed in prognostics. This paper presents an efficient and innovative supervised method for prediction/estimation: a new greedy Naive Bayesian Network (NB) classifier P-Course. METHODS: Predictions are sequential by nature: choice regarding the next parameter, test or drug depends on the previous inference and earlier experience. This sequence provides valuable information for relevance, which P-Course's hill-descending screening utilizes: the greedy algorithm starts with an empty predictor set, evaluates all possible changes at each iteration, applies the parameter leading to the best improvement in log score (indicator for prediction distribution) and stops when no improvement is gained in the score. RESULTS: P-Course introduces a rare possibility to utilize multiple priors to improve model's accuracy and area under ROC curve (AUC) in exploratory/confirmatory analysis. P-Course offers several functionalities through a graphical user interface. First, the data is uploaded in ASCII format through "Administration". Then, in "Properties", the dependent variable and independent variables (automatic/manual/ignore) are chosen. In "Priors", likelihood or weights with multilevel priors (direct/reversed) are chosen. The overall quality of the models, defaults, and case-by-case predictions can be tested with e.g. leave-one-out cross-validation, with a new data set (substitution) or with a stratum excluded from the teaching set (portioning) through "Prediction". Likelihood, posterior and inverse probability predictions are available in the "Java Playground". Severe over-learning is rarely observed. The approach is supported by theory and predictions. CONCLUSIONS: P-Course can utilize scarce, censored and complex data for e.g. segmentation, stratification, prediction, merging, data reduction, variable screening, interaction and adverse event identification, value of information (VOI), sensitivity analysis, inversion, diagnostics, and decision support.
Conference/Value in Health Info
2006-10, ISPOR Europe 2006, Copenhagen, Denmark
Value in Health, Vol. 9, No.6 (November/December 2006)
Code
PMC2
Topic
Clinical Outcomes, Methodological & Statistical Research, Real World Data & Information Systems, Study Approaches
Topic Subcategory
Clinical Outcomes Assessment, Health & Insurance Records Systems, Modeling and simulation, Registries
Disease
Multiple Diseases
Explore Related HEOR by Topic