- Frederick Berggen, PhD, Scientific
Advisor, AstraZeneca R&D, Lund Sweden
- James Chan, PharmD, Pharmacy Operator,
Kaiser Foundation, Oakland, CA USA
- Suellen Curkendall, PhD, Director of
Research, The Degge Group, Arlington, VA, USA
- William Edell, PhD, Plano, TX, USA
- Shelah Leader, PhD, Director, Medimmune,
Gaithersburg, MD, USA
- Marianne McCollum, RPH, PhD, Assistant
Professor, University of Colorado, Denver, CO, USA
- Newell McElwee, PharmD, MSPH, Senior
Director, Pfizer, Inc., New York, NY, USA
- John Walt, MBA, Manager, Allergan, Inc.,
Irvine, CA, USA
This report is published in Value in
Health. The citation for this report is:
Motheral B, Brooks J, Clark MA, et al. A checklist for retroactive
database studies -Report of the ISPOR Task Force on Retrospective
Databases.
Value in Health 2003;6:90-7
A Checklist for Retrospective Database Studies—Report of
the ISPOR Task Force on Retrospective Databases (pdf format)
A Checklist For Retrospective Database Studies Report Of The ISPOR Task Force On Retrospective Databases
Brenda Motheral MBA, PhD1, John Brooks PhD2, Mary Ann Clark MHA3, Bill
Crown PhD4, Peter Davey MD, FRCP5, Dave Hutchins MBA, MHSA6, Bradley C.
Martin PharmD, PhD7, Paul Stang PhD8
1Express Scripts, Maryland Heights, MO, USA; 2College of Pharmacy,
University of Iowa, Iowa, City, IA, USA; 3Boston Scientific Corporation,
Natick, MA, USA; 4The Medstat Group, Cambridge, MA, USA; 5Department of
Clinical Pharmacology, University of Dundee, Dundee, UK; 6Advanced PCS
Health Systems, Inc., Scottsdale, AZ, USA; 7College of Pharmacy,
University of Georgia, Athens, GA, USA; 8Galt Associates, Inc., Blue Bell,
PA, USA
ABSTRACT
INTRODUCTION: Health-related retrospective databases, in particular claims
databases, continue to be an important data source for outcomes research.
However, retrospective databases present a pose a series of methodological
challenges, some of which are unique to this data source.
METHODS: In an effort to assist decision-makers in evaluating the quality
of published studies that use health-related retrospective databases, a
checklist was developed that focuses on issues, which are unique to
database studies or are particularly problematic in database research.
This checklist was developed primarily for the commonly used medical
claims or encounter-based databases but could potentially be used to
assess retrospective studies that employ other types of databases, such as
disease registries and national survey data.
RESULTS: Written in the form of 27 questions, the checklist can be used to
guide decision-makers as they consider the database, the study
methodology, and the study conclusions. Checklist questions cover a wide
range of issues, including relevance, reliability and validity, data
linkages, eligibility determination, research design, treatment effects,
sample selection, censoring, variable definitions, resource valuation,
statistical analysis, generalizability, and data interpretation.
CONCLUSIONS: For many of the questions, key references are provided as a
resource for those who want to further examine a particular issue.
INTRODUCTION
What Is The Purpose Of This Checklist?
This checklist is intended to assist decision-makers in evaluating the
quality of published studies that use health-related retrospective
databases. Numerous databases are available for use by researchers,
particularly within the United States. As the databases have varying
purposes, their content can vary dramatically. Accordingly, the unique
advantages and disadvantages of a particular database must be bore in
mind. In reviewing a database study, it is important to assess whether the
database is suitable for addressing the research question and whether the
investigators have used an appropriate methodology in reaching the study
conclusions. The checklist was written in the form of 27 questions to
guide decision-makers as they consider the database, the study
methodology, and the study conclusions. For many of the questions, key
references are provided as a resource for those who want to further
examine a particular issue.
Why Would A Retrospective Database Be Used For A Health-Related Research
Study?
An important strength of most retrospective databases is that they allow
researchers to examine medical care utilization as it occurs in routine
clinical care. They often provide large study populations and longer
observation periods, allowing for examination of specific subpopulations.
In addition, retrospective databases provide a relatively inexpensive and
expedient approach for answering the time-sensitive questions posed by
decision-makers. Two recent studies have suggested that adequately
controlled observational studies produce results similar to randomized
controlled trials [1,2].
How Should The Checklist Be Used?
This checklist was developed primarily for the commonly used medical
claims or encounter-based databases but could potentially be used to
assess retrospective studies that employ other types of databases, such as
disease registries and national survey data. The checklist is meant to
serve as a supplement to already available checklists for economic
evaluations [3,4]. Only those issues that are unique to database studies
or are particularly problematic in database research were included in the
checklist. Not every question will be applicable to every study. As is
true with any scale or other measure of study quality or validity, the
checklist cannot discern whether something was done in a particular study
versus whether it was reported.
In summary, this checklist should serve as a general guide, recognizing
that follow-up with study authors may be warranted when no or
unsatisfactory answers to checklist questions are extant.
DATA SOURCES
Relevance: Have The Data Attributes Been Described In Sufficient Detail
For Decision-Makers To Determine Whether There Was A Good Rationale For
Using The Data Source, The Data Source’s Overall Generalizability, And How
The Findings Can Be Interpreted In The Context Of Their Own Organization?
Any given database represents a particular situation in terms of study
population, medical benefits covered, and how services are organized. To
appropriately interpret a study, key attributes should be described,
including the sociodemographic and health care profile of the population,
limitations on available services-such as those imposed by socialized
medicine, plan characteristics, and benefit design (e.g., physician
reimbursement approach, cost-sharing for office visits, drug exclusions,
mental health carve-outs). For example, in an economic evaluation that
compares two drugs, it would be important to know the formulary status of
the drugs as well as any other pharmacy benefit characteristics that could
affect the use of the drugs, such as step therapy, compliance programs,
and drug utilization review programs.
Reliability And Validity: Has The Reliability And Validity Of The Data
Been Described, Including Any Data Quality Checks And Data Cleaning
Procedures?
With any research data set, quality assurance checks are necessary to
determine the reliability and validity of the data, keeping in mind that
reliability and validity are not static attributes of a database but can
vary dramatically depending on the questions asked and analyses performed.
Quality checks are particularly important with administrative databases
from health care payers and providers because the data was originally
collected for purposes other than research, most often for claims
processing and payment. Services may not be captured in the claims
database because the particular service is not covered by the plan sponsor
or because the service is “carved-out” and not captured in the dataset
(e.g., mental health). Data fields that are not required for reimbursement
may be particularly unreliable. Similarly, data from providers who are
paid on a capitated basis often has limited utility because providers are
infrequently required to report detailed utilization information. Changes
in reporting/coding over time can result in unreliable data as well. The
frequency with which particular codes are used can change over time as
well; often in response to changes in health plan reimbursement policies.
For all these reasons, investigators should describe the quality assurance
checks performed and any steps taken to normalize the data or otherwise
eliminate data suspected to be unreliable or invalid, particularly when
there is the potential to bias results to favor one study group over
another (e.g., outliers). The authors should describe any relevant changes
in reporting/coding that may have occurred over time and how such
variation affects the study findings. Data quality should be addressed
even when the data has been pre-processed (e.g., grouped into episodes)
prior to use by the researcher. Examples of important quality checks
include missing and out of range values, consistency of data (e.g.,
patient age), and claim duplicates. Other examples of approaches that can
be used to address the quality of a database is to compare data figures to
established norms (e.g., rates of asthma diagnosis compared to prevalence
figures) and to cite previous literature in which the database’s
reliability and validity have been examined [5].
Linkages: Have The Necessary Linkages Among Data Sources And/Or Different
Care Sites Been Done Appropriately, Taking Into Account Differences In
Coding And Reporting Across Sources?
Various types of linkages can be necessary for working with claims data.
In some cases, a researcher may want to combine data from several health
plans for analysis and should describe how inconsistencies in coding and
reporting across health plans were addressed. For example, as new
procedures or services are introduced, health plans often create their own
codes so that those delivering the services can be paid. These “temporary”
codes can differ across data sources, leading to variations in how the
same events are reported. As to reporting, one simple scenario occurs when
groups of providers, who have different relationships to the health plan,
report office visits at different rates due to reimbursement arrangements.
In other cases, data from one health plan may not be integrated, requiring
the researcher to link all relevant health services (e.g., outpatient,
inpatient, mental health, pharmaceutical, laboratory, eligibility, etc.).
A particular challenge in this situation is ensuring that the each
individual’s records are accurately matched across data sources. This
linkage process should be described, with note made of any problems that
could affect data validity or study findings.
Eligibility: Have The Authors Described The Type Of Data Used To Determine
Member Eligibility?
In studies designed to examine outcomes over a particular time period at
the patient level, it is important to determine whether patients were
eligible to receive benefits during the time period. There are various
types of data and approaches that might be used to determine eligibility,
each with potential advantages and disadvantages, making it important that
the author describe how eligibility was determined. A not uncommon but
flawed approach to eligibility that is seen in the literature is the use
of a prescription claim during a particular month as evidence of
eligibility during that month. As a significant percentage of members will
not have a prescription claim in any give month for which they are
eligible, this is an inappropriate approach to eligibility determination.
METHODS
Research Design
Data Analysis Plan: Was A Data Analysis Plan, Including Study Hypotheses,
Developed A Priori?
Because of the retrospective nature and relatively easy access of claims
data, the opportunities for unsystematic data exploration are significant.
Accordingly, it is particularly important that evidence of a
well-developed a priori data analysis plan be noted for hypothesis-testing
studies. For research funded by government or other non-profit agencies,
the proposal has typically undergone a rigorous peer-review process prior
to funding. When other or no funding is extant, it may be unclear whether
the analysis plan was developed a priori unless the authors explicitly
make this statement. Hypothesis-generating studies allow for more latitude
on this issue.
Design Selection: Has The Investigator Provided A Rationale For The
Particular Research Design?
Many designs are available to the investigator, each with particular
strengths and weaknesses depending on setting, research question, and
data. The investigator should provide a clear rational for the selection
of the design given the salient strengths and weaknesses of the design.
Research Design Limitations: Did The Author Identify And Address Potential
Limitations Of That Design?
Have the investigators described the potential biases, such as selection,
history, maturation, regression to the mean, etc., and how these potential
biases will be addressed?
Treatment Effect: For Studies That Are Trying To Make Inferences About The
Effects Of An Intervention, Does The Study Include A Comparison Group And
Have The Authors Described The Process For Identifying The Comparison
Group And The Characteristics Of The Comparison Group As They Relate To
The Intervention Group?
If the investigation attempts to make inferences about a particular
intervention, a design in which there is no comparison or control group is
rarely adequate. Without a comparison group (persons non-exposed to an
intervention), there often exist too many potential biases that could
otherwise account for an observed “treatment” effect. The comparison group
should be as similar to the intervention group as possible, absent the
exposure to the intervention. A rational should be provided for selecting
individual observations to the comparison group. The validity of a
reported “treatment” effect depends on the design selected, how similar
the comparison is to those exposed to the treatment, and the statistical
analyses used (see statistics section) [6-8].
Study Population And Variable Definitions
Sample Selection: Have The Inclusion And Exclusion Criteria And The Steps
Used To Derive The Final Sample From The Initial Population Been Described
The inclusion/exclusion criteria are the minimum rules that are applied to
each potential subject's data in an effort to define a population for
study. Has a description been provided of the subject number for the total
population, sample, and after application of each inclusion and exclusion
criterion? In other words, is it clear who and how many were excluded and
why? Was there a rationale and discussion of the impact of study inclusion
and exclusion criteria on study findings, as the inclusion/exclusion
criteria can bias the selection of the population and distort the
applicability of the study findings?
Eligibility: Are Subjects Eligible For The Time Period Over Which
Measurement Is Occurring?
Databases only capture information for those patients who are 'eligible'
for coverage by the payer whose data is being analyzed. Hence, it is
important that subjects actually be eligible to receive benefits with the
payer during the time period they are being observed. In some cases, it
may be essential that only subjects who are continuously eligible for the
entire study period be included (e.g., analysis of medication continuation
rates). In other cases, subjects may only be eligible for selected months
during the study period, but any outcome measures (e.g., prescription
claims) must be adjusted for the months of eligibility.
Censoring: Were Inclusion/Exclusion Or Eligibility Criteria Used To
Address Censoring And Was The Impact On Study Findings Discussed?
Censoring or the time limits placed at the beginning or end of the study
period, may potentially distort the selection and generalizability of a
cohort. The investigator may choose to include only subjects who have some
fixed duration of eligibility (e.g., one year) after the intervention.
This method of right censoring (follow-up time) may bias the study if
duration of eligibility is related to other factors, such as general
health. For example, in government entitlement programs where eligibility
is determined monthly, limiting the study population to only those with
continuous eligibility would tend to include the sickest patients, as they
would most likely remain in conditions that make them eligible for
coverage. Alternatively, an investigator may wish to identify newly
treated patients and require that subjects be eligible for some period
prior to use of the medication of interest. This type of left censoring
should also be acknowledged and implications for study findings should be
discussed.
Operational Definitions: Are Case (Subjects) And Endpoint (Outcomes)
Criteria Explicitly Defined Using Diagnosis, Drug Markers, Procedure
Codes, And/Or Other Criteria?
Operational definitions are required to identify cases and endpoints,
often using ICD-9-CM codes, medication use, procedure codes, etc, to
indicate the presence or absence of a disease or treatment. The
operational definition(s) for all variables should be provided [9].
Definition Validity: Have The Authors Provided A Rationale And/Or
Supporting Literature For The Definitions And Criteria Used And Were
Sensitivity Analyses Performed For Definitions Or Criteria That Are
Controversial, Uncertain, Or Novel?
Investigators attempting to identify group(s) of persons with a particular
disorder (Alzheimer’s Disease) that has some diagnostic or coding
uncertainty should provide a rationale, and when possible, cite evidence
that a particular set of coding (ICD-9-CM, CPT-4, Drug Intervention)
criteria are valid. Ideally, this evidence would take the form of
validation against a primary source but more often will involve the
citation of previous research. When there is controversial evidence or
uncertainty about such definitions, the investigator should perform a
sensitivity analysis using alternative definitions to examine the impact
of these different ways of defining events. Sensitivity analysis tests
different values or combinations of factors that define a critical measure
in an effort to determine how those differences in definition affect the
results and interpretation. The investigator may choose to perform
sensitivity analyses in a hierarchical fashion or ‘caseness’ where the
analysis is conducted using different definitions or levels of certainty
(e.g., definite, probable, and possible cases).
For economic evaluations, a particularly challenging issue is the
identification of disease-related costs in a claims database. For example,
when studying depression, does one include only services with a depression
ICD-9-CM, those with a depression-related code (e.g., anxiety), or all
services regardless of the accompanying diagnosis code? As mentioned
above, sensitivity analyses of varying operational definitions are
important in these situations.
Timing Of Outcome: Is There A Clear Temporal (Sequential) Relationship
Between The Exposure And Outcome?
Does the author account for proximity of key interventions to the actual
event (outcome) of interest and duration of the intervention? For example,
if attributing emergency room visits to use of a medication, did the
emergency room visit occur during or within a clinically reasonable time
period after use of the medication? One option is to create a variable for
the duration (or cumulative) in time or dose and another variable that
reflects the time elapsed between the most proximal intervention and the
outcome itself.
Event Capture: Is The Data, As Collected, Able To Identify The
Intervention And Outcomes If They Actually Occurred?
Some procedures may not be routinely captured in claims data (e.g., office
stool guiac tests) or may not be reimbursed by the payer (e.g., OTC
medications, out-of-network use) and thereby not captured. Such a lack of
data can be an issue not only for case and endpoint identification but
also for appropriate costing of resources in economic evaluations.
Disease History: Is There A Link Between The Natural History Of The
Disease Being Studied And The Time Period For Analysis?
The researcher must address the pros and cons of the database in the
context of what is known about the natural history of the disease. For
example, a large proportion of the utilization for hepatitis occurs beyond
the initial year of diagnosis, typically up to 10 to 20 years after
diagnosis. Failing to account for this long follow-up or simply assuming a
cross section of patients adequately represents the natural history of the
disease is inappropriate.
Resource Valuation: For studies that examine costs, have the authors
defined and measured an exhaustive list of resources affected by the
intervention given the perspective of the study; and have resource prices
been adjusted to yield a consistent valuation that reflects the
opportunity cost of the resource? [Third-Level Header]
Reviewers should ensure that the resource costs included in the analysis
match the responsibilities of the decision-maker whose perspective is
taken in the research, as generally, patients, insurers, and society are
responsible for paying a different set of costs associated with the
intervention. For example, if the study is from the perspective of the
insurer the resource list should only include those resources that will be
paid for by the insurer, which would exclude non-covered services (e.g.,
over-the-counter medications).
With respect to measurement, the resource use described in these data is
limited by the extent of the insurance coverage. The clearest example of
this is the lack of prescription utilization data for Medicare
beneficiaries, as Medicare does not cover most outpatient prescriptions.
This problem also occurs under insurance products where portions of
benefits are carved-out (e.g. mental health carve-outs) and in capitated
arrangements with providers who are not required to submit detailed claims
to the insurer.
Likewise, the resource should be valued in a manner that is consistent
with the perspective. Typically, claims data provides a number of cost
figures, including submitted charge, eligible charge, amount paid, and
member co-pay. The perspective of the study will determine which cost
figure to use. For example, if the study is from the perspective of the
insurer, the valuation should reflect the amount paid by the plan sponsor,
not the submitted or eligible charge.
With this being the case, the resource price information available within
retrospective databases might provide an imperfect measure of the actual
resource price because reported plan costs may not reflect additional
discounts, rebates, or other negotiated arrangements. These additional
price considerations can be particularly important for economic
evaluations of drug therapies, where rebates can represent a significant
portion of the drug cost. In addition, prices will vary over time with
inflation and across geographic areas with differences in the
cost-of-living. In most cases, prices can be adjusted to a reference year
and place using relevant price indexes [10].
Statistics
Control Variables: If The Goal Of The Study Is To Examine Treatment
Effects, What Methods Have Been Used To Control For Other Variables That
May Affect The Outcome Of Interest?
One of the greatest dangers in retrospective database studies is
incorrectly attributing an effect to a treatment that is actually due, at
least partly, to some other variable. Failure to account for the effects
of all variables that have an important influence on the outcome of
interest can lead to biased estimates of treatment effects, which are
referred to as a confounding bias. For example, a study might find that
the use of Cox-2s is associated with subsequent lower rates of
gastrointestinal (GI) events compared to NSAID users. If physicians are
more likely to prescribe Cox-2s to patients with a history of GI disease
and the study does not control for the history of GI disease, then
confounding basis is present. Two common approaches for addressing
confounding bias in the analysis include 1) the stratification of the
sample by different levels of the confounding variables with comparison of
the treatments within potential confounders (e.g., age, sex); and 2) the
use of multivariate statistical techniques that allow for the estimation
of the treatment effect while controlling for one or more confounders
simultaneously. Each of these approaches has strengths and weaknesses.
Often investigations will attempt to control for comorbidities and or
disease severity using risk adjustment techniques (e.g., Chronic Disease
Score, Charlson Index,). The risk adjustment model should be suitable for
the population/disease that is being investigated, and a rationale for the
selection of the risk adjustment model should be described [11-16].
In addition, in certain situations researchers can use methods (e.g.
instrumental variable techniques) that group patients in manner that is
related to treatment choice but theoretically unrelated to unmeasured
confounders. These approaches can be thought of as ex post randomizing
methods, and consistent estimates of treatment effects are obtained by
comparing treatment and outcome rates across groups [17].
Statistical Model: Have The Authors Explained The Rationale For The
Model/Statistical Method Used?
Statistical methods are based upon a variety of underlying assumptions.
Often these stem from the distributional characteristics of the data being
analyzed. As a result, in any given retrospective analysis, some
statistical methods will be more appropriate than others. Authors should
explain the reasons why they chose the statistical methods that were used
in the analysis. In particular, the approach to addressing skewed data, a
common issue in claims database research, should be described (e.g.,
log-transformation, two-part models).
For studies that combine data from several databases, the authors should
describe what analyses have been done to account for hierarchical or
clustered data. For example, with data pooled across plans, patients will
be grouped within health plans, and the health plan may have a significant
impact on the outcome being measured. Outcomes may be attributed to a
particular patient-level intervention, when in fact the outcome may be due
to differences in health plans, such as formularies and co-pay amounts.
Methods such as hierarchical linear modeling may be appropriate when using
pooled data, and authors should discuss this issue when describing the
selection of statistical methods.
Influential Cases: Have The Authors Examined The Sensitivity Of The
Results To Influential Cases?
The results of retrospective database studies, particularly analyses of
economic outcomes, can be very sensitive to influential cases. For
example, an individual who is depressed and attempts to commit suicide
might have extremely high medical costs that could dramatically change
conclusions about the costs of treating a patient with a particular
antidepressant therapy. Such “outliers” can be particularly problematic if
the sample is small. There are a variety of tests to measure the
sensitivity of findings to influential cases but, basically, the idea is
to see how much the results change when these cases are removed from the
analysis. Logarithmic transformations, commonly used to reduce the
skewness in economic outcome variables, can create serious problems in
making inferences about the size of statistical differences in the
original (unlogged) dollar units.
Alternatively, analyses can be conducted on measures of underlying service
utilization (e.g., numbers of office visits) rather than the dollar values
themselves; service utilization measures tend to be less skewed than their
economic counterparts. Using this approach, any identified differences in
service utilization can be subsequently valued using an appropriate fee
schedule. A caveat with using service utilization directly is that
statistical analyses, such as regression modeling, may require the use of
more sophisticated methodologies (e.g., count models) than those commonly
used in expenditure analyses [18,19].
Relevant Variables: Have The Authors Identified All Variables Hypothesized
To Influence The Outcome Of Interest And Included All Available Variables
In Their Model?
Retrospective databases are often convenience datasets that were
constructed for a purpose completely unrelated to the research study being
conducted (e.g., the processing of medical claims). Although they can be
extremely rich, such databases often lack information on some of the
variables that would be expected to influence the outcome measure of
interest. For example, the medication that a patient receives is likely to
be partly a function of their clinical characteristics (primary diagnosis,
medical comorbidities) and partly a function of physician prescribing
patterns. Often retrospective datasets contain information on one of these
components but not the other. This is a problem because omitted variables
can lead to biased estimates for the variables that are included in the
model. In the special case where the omitted variables are correlated with
both the treatment selection and the outcome of interest, the problem is
known as selection bias. Several statistical procedures have been
developed that attempt to test for, and reduce, the bias introduced by
unobservable variables [20-24].
Testing Statistical Assumptions: Do The Authors Investigate The Validity
Of The Statistical Assumptions Underlying Their Analysis?
Any statistical analysis is based on assumptions. For example, regression
analyses may include testing for omitted variables, simultaneity of
outcomes and covariates, correlation among explanatory variables, and a
variety of others. To have confidence in the author’s findings, model
specification tests should be discussed [25,26].
Multiple Tests: If Analyses Of Multiple Groups Are Carried Out, Are The
Statistical Tests Adjusted To Reflect This?
The more statistical tests one conducts, the greater the likelihood that a
“statistically significant” result will emerge purely by chance.
Statistical methods have been developed which adjust for the number of
tests being conducted. These methods reduce the likelihood that a
researcher will identify a statistically significant finding that is due
solely to chance [27-29].
Model Prediction: If The Authors Utilize Multivariate Statistical
Techniques In Their Analysis, Do They Discuss How Well The Model Predicts
What It Is Intended To Predict?
Numerous approaches, such as goodness of fit or split samples, can be used
to assess a model's predictive ability. For example, in ordinary least
squares regression models, the adjusted R-square (which measures the
proportion of the variance in the dependent variable explained by the
model) is a useful measure. Nonlinear models have less intuitive goodness
of fit measures.
Models based on micro-level data (e.g., patient episodes) can be “good
fits” even if the proportion of the variance in the outcome variable that
they explain is 10 percent or less. In fact, models based on micro-level
data that explain more than 50 percent of the variation in the dependent
variable should be viewed with suspicion [30].
DISCUSSION/CONCLUSIONS
Theoretical Basis: Have The Authors Provided A Theory For The Findings And
Have They Ruled Out Other Plausible Alternative Explanations For The
Findings?
The examination of causal relationships is a particular challenge with
retrospective database studies because subjects are not randomized to
treatments. Accordingly, the burden is on the author to rule out plausible
alternative explanations to the findings when examining relationships
between two variables. This requires a consideration of the type of study,
its design and analysis, and the nature of the results.
Practical Versus Statistical Significance: Have The Statistical Findings
Been Interpreted In Terms Of Their Clinical Or Economic Relevance?
In retrospective database studies, the sample sizes are often extremely
large, which can render potentially un-meaningful differences to be
statistically significantly different. In some studies that have
relatively small sample sizes, the large variance in cost data can render
meaningful differences statistically insignificant. Accordingly, it is
imperative that both statistical and clinical or economic relevance be
discussed.
Generalizability: Have The Authors Discussed The Populations And Settings
To Which The Results Can Be Generalized?
While retrospective database studies often have greater generalizability
than randomized controlled trials, this generalizability cannot be
assumed. The authors should be explicit as to which populations and
settings the findings can be generalized. In addition, the impact of
changes in the health care environment during and since the conduct of the
study on generalizability should be discussed. For example, economic
evaluations are sometimes conducted shortly after a product is launched,
when it has not reached full market penetration. In those cases, patients
studied may be systematically more or less severe than the ultimate
population of users of that medication, which can impact effectiveness and
cost outcomes.
ACKNOWLEDGEMENTS
We would like to recognize the efforts of Fredrik Berggren, James Chan,
Sueellen Curkendall, Bill Edell, Shelah Leader, Marianne McCollum, Newell
McElwee, and John Walt, reference group members who provided comments on
earlier drafts.
REFERENCES
1 Concato J, Shah N, Horwitz RI. Randomized, controlled trials,
observation studies, and the hierarchy of research designs. New England
Journal of Medicine 2000;342:1887-1892.
2 Benson K, Hartz AJ. A comparison of observation studies and randomized,
controlled trials. New England Journal of Medicine 2000;342:1878-1886.
3 Clemens K, Townsend R, Luscombe F, et al. Methodological and conduct
principles for pharmacoeconomic research. PharmacoEconomics
1995;8:169-174.
4 Weinstein M, Siegel JE, Gold MR et al. Recommendations of the panel on
cost-effectiveness in health and medicine. Journal of the American Medical
Association 1996;276:1253-1258.
5 McGlynn EA, Damberg CL, Kerr EA, Brook RH, Health Information Systems
Design Issues and Analytic Applications. Santa Monica: Rand Health, 1998.
6 Campbell S and Stanley J, Experimental and Quasi-experimental Design for
Research. Chicago: Rand McNally, 1963.
7 Cook T and Campbell, Quasi-experimentation. Chicago: Rand McNally, 1979.
8 McGlynn EA, Damberg CL, Kerr EA, Brook RH, Health Information Systems
Designs Issues and Analytic Applications. Santa Monica: Rand Health, 1998.
9 Motheral BR, Fairman KA. The Use of Claims Databases for Outcomes
Research: Rationale, Challenges, and Strategies. Clin Ther 1997;
19:346-66.
10 Lave JR, Pashos CL, Anderson GF et al. Costing medical care: Using
administrative data. Med Care 1994(supplement);32:JS77-JS89.
11 Ash AS, Ellis RP, Pope GC, et al. Using Diagnoses to Describe
Populations and Predict Costs. Health Care Fin Rev 2000;21:7-28.
12 Clark DO, Von Korff M, Saunders K, et al. A chronic disease score with
empirically derived weights. Med Care 1995;33:783-95.
13 Deyo AR, Cherkin DC, Ciol MA. Adapting A Clinical Comorbidity Index for
Use with ICD-9-CM Administrative Databases. J Clin Epidemiol
1992;45:613-19.
14 Gilmer T, Kronick R, Fishman P, Ganiats TG. The Medicaid Rx model:
pharmacy-based risk adjustment for public programs. Med Car
2001;39:1188-02.
15 Lezzoni L, Ash AS, Daley J, et al. Risk adjustment for measuring
healthcare outcomes (2nd ed). Chicago: Health Administration Press, 1997.
16 Kronick R, Gilmer T, Dreyfus T, Lee L. Improving Health-Based Payment
for Medicaid Beneficiaries: CDPS. Health Care Fin Rev 2000;21:29-64.
17 Angrist, J.D., G.W. Imbens, D.B. Rubin. “Identification of Causal
Effects Using Instrumental Variables.” Journal of the American Statistical
Association 1996;91:444-54.
18 Mullahy J. Much ado about two: Reconsidering retransformation and the
two-part model in health econometrics. J Health Econ 1998;17:247-281.
19 Cameron, A. and Trivedi, P. Regression Analysis of Count Data. New
York: Cambridge University Press, 1998.
20 Crown W, Obenchain R, Englehart L, et al. Application of sample
selection models to outcomes research: The case of evaluating effects of
antidepressant therapy on resource utilization. Stat Med 1998;17:1943-58.
21 D’Agostino, R. Tutorial in biostatistics: Propensity score methods for
bias reduction in the comparison of a treatment to a non-randomized
control group. Stat Med 1998;17:2265-81.
22 Heckman JJ. The common structure of statistical models of truncation,
sample selection, and limited dependent variables and a simple estimator
for such models. Annals of Economic and Social Measurement 1976;5:475-92.
23 Jones A. Health Econometrics. In: Culyer AJ, Newhouse JP, eds.,
Handbook of Health Economics. North Holland: Elsevier, 2000.
24 Terza J. Estimating Endogenous Treatment Effects in Retrospective Data
Analysis. Value in Health 1999;2:429-34.
25 Belsley D, Kuh E, Welsh R. Regression Diagnostics: Identifying
Influential Data and Sources of Collinearity. New York: John Wiley & Sons,
Inc., 1980.
26 Godfrey, L. Misspecification Tests in Econometrics: The Lagrange
Multiplier Principle and Other Approaches. New York: Cambridge University
Press, 1991.
27 Tukey, J. W. The Problem of Multiple Comparisons. Unpublished Notes,
Princeton University, 1953.
28 Scheffe, H. A Method for Judging All Contrasts in the Analysis of
Variance. Biometrika, 1953;40:87-104.
29 Miller RG, Jr. Simultaneous Statistical Inference. New York: Springer-Verlag,
1981.
30 Greene W. Econometric Analysis (4th ed.). Englewood Cliffs, NJ:
Prentice-Hall, Inc., 1999.
|