The Official News & Technical Journal Of The International Society For Pharmacoeconomics And Outcomes Research

Issues in the Design of Database Studies: A Focus on Selection Bias

By Lieven Annemans PhD, MSc, Mman, ISPOR Past-President 2005-2006, Director, HEDM and Professor of Health Economics, University of Ghent, Meise, Belgium


This summary is based on the IMS Symposium, “Methodological Issues in the Analysis of Health Care Databases,” from the ISPOR 10th Annual International Meeting, May 16, 2005, Washington, DC.

The Pros and Cons of Database Research
Selection bias in clinical studies raises questions around the relevance of trial results when extrapolated to the general population. Many different types of databases are used for health economic research, such as claims databases, disease registries, and the more complete longitudinal operational databases, which are now run for research purposes, such as GPRD, MEMO, and IMS Disease Analyzer. Key advantages of their use for health economic evaluation versus prospective comparative studies is that they:

  • Provide real world data and reflecting patient care in daily clinical practice;
  • Includea different type of patient from a clinical trial;
  • Often provide long-term, follow-up data;
  • Allow data to be obtained faster and at a lower cost than a clinical trial; and
  • Enable comparisons which are not always allowed prospectively (i.e., ethical approval).

Yet, database research is subject to bias. Indeed, treatments are no longer randomly assigned, and the treatment choice is based on patient or disease characteristics (or on the physician’s personal preference). Hence, the patient’s treatment is no longer the only difference, and so-called “selection bias” is the result.

Bias
Types of bias are selection bias treatment selection bias (ie, treatment choice not random), and confounding bias (ie variables such as indication, age, gender, and disease severity influence the outcome). Confounding bias is the result of treatment selection bias and patient characteristics, which are different

A simple way to account for confounding bias is to add the possible confounding factors to a regression model, which can then be adjusted. In simple terms, the outcome can be influenced by many variables, including indication, age, gender, and disease severity, as well as the chosen treatment (Drug A or Drug B). In such a multivariate regression, the treatment effect is corrected for the other variables.

Although this is a relatively simple and straightforward process, researchers must be mindful of interaction effects (i.e., the treatment effect differs for each covariate level), and calculations can become quite complex if adjustments must be made for many variables.

One alternative is to match patients according to one or more variables. The latter can be done 1 to 1 or 1 to many, depending on subject data availability. Exact matching can be applied whereby “cases” and “controls” are matched on each variable. Alternatively, when many possible covariates are present, quasi-exact matching can be applied with the help of propensity scores.

Propensity Scores
The use of propensity scores involves an additional analysis, whereby the dependent variable is not the outcome, but the probability of being assigned to a specific treatment. Hence, the propensity score lies between 0 and 1, and will depend on the covariates/confounding factors. This probability (propensity), a function of the covariates, is then an indicator of the ‘severity’, the characteristics of the subjects in each treatment group. For instance, in a project using the longitudinal Belgium Hospital Disease Database, patients assigned to a new treatment for fistula were clearly more severe as shown by the covariates influencing the propensity score.

These propensity scores can then be used in the final analysis in 3 different ways: 1) included as a continuous variable in a (logistic) regression model with treatment outcome as dependent variable; 2) included as a class variable in such a (logistic) regression model (for instance 5-10 classes/subgroups, e.g., deciles); or 3) used to match patients (see above), i.e., divide the sample into classes (usually 5-10 classes), and perform a stratified analysis (N to N). For instance, all patients with a propensity score for receiving treatment A between 0 and 0.05 are put together in one stratum; the next stratum contains patients with a propensity score = 0.05 - 0.10, etc.

The possibilities are summarized in Figure 1. This stratified approach was used in the example of the longitudinal Belgium Hospital Disease Database, and showed a much better effect for the new treatment in the largest stratum, and no significant difference in the other strata.

These techniques are essential for obtaining reliable and valid answers to research questions, but issues still are likely to occur with the selection of covariates: “did we take the proper/sufficient covariates?” and the presence of unobservable data.

Hence, the researcher can never be 100% sure that all groups have been matched, since there has been no randomization and all the necessary, clinically important covariates cannot be taken into account. The way round this is to always run sensitivity analyses to test the relevance of different variables, for instance by changing the covariates subsets, or by using different propensity classifications. Finally, it is also important not to overlook the many other issues in the design of databases that have been reported in the literature, such as:

  • Inexplicit research questions;
  • Poorly defined cohort eligibility;
  • Poorly defined index point (baseline);
  • Wrong extrapolation of the results; and
  • Issues around data quality (coding errors, incompleteness, lack of validation, etc)

The ISPOR Guidelines for Retrospective Research refer to these and many other issues and may be of help to researchers making use of retrospective data for health economic evaluation purposes and can be found on the ISPOR website at: http://www.ispor.org/workpaperhealthscience/ret_dbTFR0203.asp.


  Issues Index | 2005 Issues Index

 

Contact ISPOR @ info@ispor.org  |  View Legal Disclaimer
©2010 International Society for Pharmacoeconomics and Outcomes Research.
All rights reserved under International and Pan-American Copyright Conventions.
 
Website design by Eagle Systems USA, Inc.