The $64,000 Question: What
Are We Willing to Pay for a QALY?
F. Reed Johnson PhD, RTI
Health Solutions, RTP, NC, USA
The following is the third of a 3-part series taken from the
Third Plenary Session, Wednesday May 19th, 2004, at the
ISPOR 9TH Annual International Meeting, Arlington, VA, USA
I’d like to start with Bernie O’Brien’s charge to me. This is
what he wrote in a letter some months ago. He said: “I’m hoping
you can speak to the research agenda issues raised by the QALY/WTP
debate. I know you have some specific views on this, [I just
might interject here that my views that are informed by
impeccable logic] but I would want you to give a balanced airing
of all of the debates that have arisen associated with this
topic.” He presumably is referring to the debates related to
this topic over the last 30 years in two separate research
literatures. Unfortunately, I’m a mere mortal. I think Bernie
probably could have done this, but I’m not even going to try in
the 12 _ minutes remaining to me. I will try to deal briefly
with what I think is the research agenda, however.
Let’s start with quality-adjusted life years, or QALYs. I
stole this slide from Bernie some time ago. Our problem is to
compare apples to oranges. In order to do that, we need bananas.
We offer subjects a lottery between bananas and apples and that
somehow tells us something about the value of oranges. For QALYs,
we just call apples perfect health, bananas death, and oranges
something like asthma. If we’re willing to accept the idea that
the utility of perfect health is equal to one and the utility of
death is equal to zero, and if we can get subjects to tell us
what value of p makes the lottery have the same utility as
asthma with certainty, then we get this equation.

The trick in doing QALYs is to look for some sort of adjustment
that equates two different utilities, one deriving from a given
health state and one deriving from some reference condition.
This is what the calculation I just showed you does. The
time-tradeoff method uses life expectancy or longevity to
equilibrate the utility of some compromised health state with
the utility of perfect health. Thus the standard gamble QALY
weight is the probability that gives the same utility for the
lottery as for the certain ill-health outcome, while the
time-tradeoff QALY weight is the ratio of time in perfect health
and time in ill health that gives the same utility. We multiply
either weight measure times duration of ill health to get the
corresponding QALYs.

The advantages of this approach are obvious: QALYs are simple
and intuitive. Nothing I can show you as an alternative is going
to be nearly as simple or intuitive. Thus we’re doomed to lose
this battle on that basis alone. In addition, there are readily
available, extensive lists of condition-specific QALY weights.
All you have to do is look up the condition of interest in Tammy
Tengs’ “1,000 QALY Estimates” [1]. You just look up the
corresponding QALY weight and multiply it times the duration
that the epidemiologists give to you.
This is such a wonderful and enormously useful solution to a
difficult problem. Nobody needs any specialized training and the
calculations are widely accepted for cost utility analysis for
healthcare expenditures. But this slide shows the basic “Mae
West” problem with QALYs. Most of us are not indifferent between
a short, but fun, life and a long, but dull, life.

owever the area under the short and fun line and the area under
the long and dull linedicate exactly the same number of QALYs.
Mae West says that’s nuts and many of you might agree.
There are lots of reasons why people could prefer one of these
life profiles over the other. QALYs as we measure them in
practice (not necessarily as derived from von
Neumann-Morgenstern utility principles) are not grounded in
utility theory because utility theory does not require us to be
indifferent between these two life profiles. I’m sorry I don’t
have time to actually document this, but QALY assumptions also
fail empirical tests. Furthermore, QALY-weight estimates fail
reliability tests. Standard gamble and time trade-off methods do
not get the same QALY utility weight for the same condition.
QALYS also are insensitive to acute conditions. Many of you have
a drug that you know patients prefer for treating an acute
condition, but you can’t get any meaningful QALYs for that
preference. Why not? If patients like a treatment better than
something else, why can’t you demonstrate that the utility of
this drug is better than the comparator treatment? You can’t
with QALYs and I’ll show why in a minute. Finally, the
over-simplified context required for estimating QALYs doesn’t
look like any real life that anyone actually lives or any real
decision that patients or physicians actually have to make.

This diagram depicts a more realistic relationship between ill
health and time. The vertical axis indicates health utility and
the horizontal axis indicates duration of some illness. The
slope of this curve is just one minus the QALY weight. If I have
perfect health, then my illness duration is zero. If I must
spend some time in ill health, I’m going to lose QALYs equal to
the illness duration times one minus the QALY weight. Because
the relation is a straight line, I can just read off the
reduction in QALYs associated with any given duration.
None of the pictures that Jim showed you used straight lines
because economists never describe utility in terms of straight
lines. We always describe utility in terms of diminishing
marginal utility. For example, the first cup of coffee in the
morning gives you a big kick. The 13th cup of coffee that day
doesn’t give you the same improvement in satisfaction.

That’s true of virtually everything in your life. If health
works like everything else, the first day you spend being sick
decreases your perceived satisfaction more than the 13th day
that you spend being sick. That explains why QALYs for acute
conditions understate the utility loss because the true utility
loss on the diagram is not here, but down here. It just so
happens that, if we were to estimate the nonlinear preferences
and the linear QALY preferences for the same duration, the
estimated QALY weight would give us exactly the same utility
loss as the true utility loss. Of course, in practice we pay no
attention to what the “right” duration is, we just measure the
QALY weight for an arbitrary duration and then use the same
weight for every possible duration.
QALYs distort true preferences in other ways. People may prefer
a treatment because it has a better side-effect profile, it
requires only one small pill a day instead of three, or it’s
cheaper. They may prefer a treatment because of the way it is
delivered-over the counter versus having to go to a doctor for a
prescription. Treatments often have features that influence
patient satisfaction, but for which you can’t get a single
QALY’s credit for. In fact, the utility of your patients isn’t
represented by the nonlinear dotted line but the solid nonlinear
line that lies below it. That is, the disutility curve is both
nonlinear and also steeper because it includes other variables
that people care about but QALYs don’t include. You
underestimate the QALY gain even further by using the straight
line instead of the real utility of your patients.
What we just did in this diagram is what economists do in every
area of applied economics, except health. It is puzzling to me
why we have to have a special theory to do health economics
instead of the theory that everyone else uses. I’m going to
propose a more general approach here, but my suggestion really
is quite trivial. When I show this to anyone but a health
economist they say: So what? Of course. That’s just economics.
This is “just economics” but it may look a little radical to
people used to QALYs.
Suppose utility depends on things other than some narrow,
clinical measure of health. It depends on the time spent in the
health state and it varies non-linearly with time. Utility
depends on your consumption of other things and the value of
that consumption may depend on how healthy I am. I may not be
able to consume such things as rock climbing all if I’m sick. It
could also work the other way around. The amount of money I have
may influence the utility I get out of different health states.
For example, wealthier people may be able to pay for services
that mitigate the impact of ill health on activities of daily
living.
In addition, we can incorporate process factors. People care
about how health is delivered, not just what is delivered. If
they care at all, that ought to count in their utility. Then
there are a myriad of individual characteristics, old people,
young people, rich people, poor people, people with a long
history of illness, people with a short history of illness,
people who live in urban areas or rural areas, people who live
in France or the United States, all of these things could
matter. QALYs usually don’t let you adjust for any of those
factors. I also want to avoid trying to get the equilibrating
measure by requiring people to trade off the probability of
immediate death or reductions in longevity. We don’t have to do
that. Survey subjects resist trading off death or a reduction in
longevity to get rid of three hours of headache. If they are
unwilling to accept that tradeoff, then you can’t estimate a
QALY weight.
Now let’s endow our subjects with a spell of ill health. It
could be 30 minutes, 30 days, or 30 years. Depending on the
illness, there is generally some therapeutically relevant
duration over which this illness typically is treated. We
probably don’t have to tinker around with what goes on at the
end of people’s life. However, spells of good and ill health
have to add up within a given time frame. An additional day of
ill health has to reduce the number of possible days of good
health in the period by one day.
We would like to be able to derive both money and time
equivalences of ill health from the same underlying preferences,
letting people have preferences over whatever is important to
them. We replace the utility of a given amount of time in a
better health state with the utility of time in a worse health
state. Now calculate how much money we would have to take away
from the patient to reduce their utility back to the same level
they experienced in the worse health state.

Economists generally call this adjustment in income or wealth
“willingness to pay.” I would like to officially apologize for
not only the term “value of a statistical life,” but the term
“willingness to pay.” These terms cause nothing but mischief. It
would be fine if economists only talked to non-economists. But I
can understand why non-economists are quite offended when they
hear us talking this way. The American Economic Association
ought to hire a marketing firm to rename all the things that we
talk about so that they don’t make us look stupid and
embarrassed in front of non-economists. What we mean to say, of
course, is
that there is some reduction in consumption that would just
cancel out the utility improvement from better health. Of
course, this has nothing at all to do with willingness to do
anything. So here is my non-profound idea: if we can make the
adjustment with consumption, why not make the adjustment with
time? I’m going to call this time adjustment “willingness to
wait,” although it doesn’t have anything to do with willingness,
either.

offended when they hear us talking this way. The American
Economic Association ought to hire a marketing firm to rename
all the things that we talk about so that they don’t make us
look stupid and embarrassed in front of non-economists. What we
mean to say, of course, is that there is some reduction in
consumption that would just cancel out the utility improvement
from better health. Of course, this has nothing at all to do
with willingness to do anything. So here is my non-profound
idea: if we can make the adjustment with consumption, why not
make the adjustment with time? I’m going to call this time
adjustment “willingness to wait,” although it doesn’t have
anything to do with willingness, either.
offended when they hear us talking this way. The American
Economic Association ought to hire a marketing firm to rename
all the things that we talk about so that they don’t make us
look stupid and embarrassed in front of non-economists. What we
mean to say, of course, is that there is some reduction in
consumption that would just cancel out the utility improvement
from better health. Of course, this has nothing at all to do
with willingness to do anything. So here is my non-profound
idea: if we can make the adjustment with consumption, why not
make the adjustment with time? I’m going to call this time
adjustment “willingness to wait,” although it doesn’t have
anything to do with willingness, either.
We actually did this survey before I gave any thought to QALYs
and willingness to wait. We endowed people with a hypothetical
initial acute condition of five days. In the example shown, they
would be sick for five days with shortness of breath and
swelling in ankles if they didn’t do anything. That is, they
would experience an exacerbation of congestive heart failure
that would put them in the hospital for five days. Someone is
going to have to take care of them, but it’s covered by
insurance and won’t cost them anything. For some amount of
money, you could shorten duration to one day or avoid the
condition all together. We asked subjects to evaluate a series
of 10 such tradeoff tasks, adjusting and rotating the features
of the conditions, and asked them in each case which of the
conditions they preferred. The pattern of those choices reveals
the underlying relative rates of substitution or relative
importance weights that are attached to various features.
Notice that they traded off both time and money at the
same time, as we do in real life. We got disutility curves for
various conditions just like the ones I showed you earlier. They
are nonlinear and we can derive willingness to pay, willingness
to wait, and a form of time equivalence that looks like a
generalized QALY. I’ve calling this measure “super QALYs”, which
may be a little arrogant since it’s such a simple idea. In any
case, from exactly the same underlying preferences we can derive
several kinds of equivalence measures for a given intervention.
We don’t have use straight lines, we don’t have understate the
utility loss of acute conditions, we don’t have to leave out
things that are important to patients, and we don’t have to make
patients trade off death when death isn’t a realistic disease
endpoint.

Can we build a better measure of health utility? I’ve shown you
how I think that could be done, but we could also try to fix
what’s wrong with QALYs. I don’t think this is a great idea but,
out of respect for Bernie’s memory, I can suggest a few things.
There’s no reason why we have to insist that QALYs be linear in
time. We could estimate QALYs for different time intervals. This
is more or less Gafni’s Healthy Year Equivalence. We should
really not attempt to extrapolate among chronic and acute
conditions using the same weights. That is, QALY weights should
pay attention to the context in which the QALY was originally
measured, both the health state and the duration that was
specified in either the time-tradeoff or the standard-gamble
elicitation. We don’t let you extrapolate outside the range of
your epidemiologic data, so why do we let you extrapolate
outside the range of your QALY data? Incidentally, why don’t we
ever see confidence intervals on those QALY weights? I don’t
know why we have to suspend both economics and statistics in
estimating QALYs. There’s also no reason why we couldn’t
estimate QALY weights as a function of the characteristics of
the respondents of the survey.
There’s also no reason, in principle, that we couldn’t include
other factors that influence patient satisfaction.
Unfortunately, such complications may make standard-gamble or a
simple time-tradeoff surveys intractable. I’ve suggested a more
valid and effective way of approaching the problem by
constructing tradeoffs that look more like the tradeoffs people
make in real healthcare settings. These hypothetical tradeoffs
reveal the rates of substitution we need to calculate time or
money equivalences of treatment benefits. I don’t have time to
discuss the controversies related to contingent evaluation, or
so-called “willingness to pay method”. In some ways conventional
willingness-to-pay studies share many of the same problems as
QALYs in not allowing for a full range of opportunities to trade
off in various dimensions. Basically, we describe a treatment
scenario and ask people what is the most they would pay for it.
It doesn’t make sense to constructing tradeoffs that look more
like the tradeoffs people make in real healthcare settings.
These hypothetical tradeoffs reveal the rates of substitution we
need to calculate time or money equivalences of treatment
benefits.
I don’t have time to discuss the controversies related to contingent
evaluation, or so-called “willingness to pay method”. In some
ways conventional willingness-to-pay studies share many of the
same problems as QALYs in not allowing for a full range of
opportunities to trade off in various dimensions. Basically, we
describe a treatment scenario and ask people what is the most
they would pay for it. It doesn’t make sense to

ask them how much they would be willing to pay because they are
likely to say they aren’t willing to pay anything. Many
researchers avoid that reaction and other problems by offering
subjects a price of, say, £250 pounds and recording whether they
would accept or reject the treatment at such a price. We get
very little information about underlying preferences from such a
question and have trouble generalizing the results to even
slightly different situations. Finally, many people are offended
by dollar values of health benefits, which, again, is why QALYs
are so attractive.
Amiram Gaffney was an old friend and colleague of Bernie’s. He
once asked a question that I think we all ought to ask
ourselves. I’d like to end with his question: “When asking the
public to assist in determining health priorities, we should use
techniques that allow people to reveal their true preferences.
If not, why bother asking them at all? [2]”
REFERENCES
1 Tengs T, Wallace A. One-thousand health-related quality of
life estimates. Med Care 2000 38:583-637.
2 Gafni A., Birch S. Preferences for outcomes in economic evaluation: An
economic approach to addressing economic problems. Soc Sci Med
1995;40:767- 6.
|