The Official News & Technical Journal Of The International Society For Pharmacoeconomics And Outcomes Research
The $64,000 Question: What Are We Willing to Pay for a QALY?
F. Reed Johnson PhD, RTI Health Solutions, RTP, NC, USA

The following is the third of a 3-part series taken from the Third Plenary Session, Wednesday May 19th, 2004, at the
ISPOR 9TH Annual International Meeting, Arlington, VA, USA

I’d like to start with Bernie O’Brien’s charge to me. This is what he wrote in a letter some months ago. He said: “I’m hoping you can speak to the research agenda issues raised by the QALY/WTP debate. I know you have some specific views on this, [I just might interject here that my views that are informed by impeccable logic] but I would want you to give a balanced airing of all of the debates that have arisen associated with this topic.” He presumably is referring to the debates related to this topic over the last 30 years in two separate research literatures. Unfortunately, I’m a mere mortal. I think Bernie probably could have done this, but I’m not even going to try in the 12 _ minutes remaining to me. I will try to deal briefly with what I think is the research agenda, however.

Let’s start with quality-adjusted life years, or QALYs. I stole this slide from Bernie some time ago. Our problem is to compare apples to oranges. In order to do that, we need bananas. We offer subjects a lottery between bananas and apples and that somehow tells us something about the value of oranges. For QALYs, we just call apples perfect health, bananas death, and oranges something like asthma. If we’re willing to accept the idea that the utility of perfect health is equal to one and the utility of death is equal to zero, and if we can get subjects to tell us what value of p makes the lottery have the same utility as asthma with certainty, then we get this equation.

The trick in doing QALYs is to look for some sort of adjustment that equates two different utilities, one deriving from a given health state and one deriving from some reference condition. This is what the calculation I just showed you does. The time-tradeoff method uses life expectancy or longevity to equilibrate the utility of some compromised health state with the utility of perfect health. Thus the standard gamble QALY weight is the probability that gives the same utility for the lottery as for the certain ill-health outcome, while the time-tradeoff QALY weight is the ratio of time in perfect health and time in ill health that gives the same utility. We multiply either weight measure times duration of ill health to get the corresponding QALYs.

The advantages of this approach are obvious: QALYs are simple and intuitive. Nothing I can show you as an alternative is going to be nearly as simple or intuitive. Thus we’re doomed to lose this battle on that basis alone. In addition, there are readily available, extensive lists of condition-specific QALY weights. All you have to do is look up the condition of interest in Tammy Tengs’ “1,000 QALY Estimates” [1]. You just look up the corresponding QALY weight and multiply it times the duration that the epidemiologists give to you.

This is such a wonderful and enormously useful solution to a difficult problem. Nobody needs any specialized training and the calculations are widely accepted for cost utility analysis for healthcare expenditures. But this slide shows the basic “Mae West” problem with QALYs. Most of us are not indifferent between a short, but fun, life and a long, but dull, life.

owever the area under the short and fun line and the area under the long and dull linedicate exactly the same number of QALYs. Mae West says that’s nuts and many of you might agree.

There are lots of reasons why people could prefer one of these life profiles over the other. QALYs as we measure them in practice (not necessarily as derived from von Neumann-Morgenstern utility principles) are not grounded in utility theory because utility theory does not require us to be indifferent between these two life profiles. I’m sorry I don’t have time to actually document this, but QALY assumptions also fail empirical tests. Furthermore, QALY-weight estimates fail reliability tests. Standard gamble and time trade-off methods do not get the same QALY utility weight for the same condition.

QALYS also are insensitive to acute conditions. Many of you have a drug that you know patients prefer for treating an acute condition, but you can’t get any meaningful QALYs for that preference. Why not? If patients like a treatment better than something else, why can’t you demonstrate that the utility of this drug is better than the comparator treatment? You can’t with QALYs and I’ll show why in a minute. Finally, the over-simplified context required for estimating QALYs doesn’t look like any real life that anyone actually lives or any real decision that patients or physicians actually have to make.

This diagram depicts a more realistic relationship between ill health and time. The vertical axis indicates health utility and the horizontal axis indicates duration of some illness. The slope of this curve is just one minus the QALY weight. If I have perfect health, then my illness duration is zero. If I must spend some time in ill health, I’m going to lose QALYs equal to the illness duration times one minus the QALY weight. Because the relation is a straight line, I can just read off the reduction in QALYs associated with any given duration.

None of the pictures that Jim showed you used straight lines because economists never describe utility in terms of straight lines. We always describe utility in terms of diminishing marginal utility. For example, the first cup of coffee in the morning gives you a big kick. The 13th cup of coffee that day doesn’t give you the same improvement in satisfaction.

That’s true of virtually everything in your life. If health works like everything else, the first day you spend being sick decreases your perceived satisfaction more than the 13th day that you spend being sick. That explains why QALYs for acute conditions understate the utility loss because the true utility loss on the diagram is not here, but down here. It just so happens that, if we were to estimate the nonlinear preferences and the linear QALY preferences for the same duration, the estimated QALY weight would give us exactly the same utility loss as the true utility loss. Of course, in practice we pay no attention to what the “right” duration is, we just measure the QALY weight for an arbitrary duration and then use the same weight for every possible duration.

QALYs distort true preferences in other ways. People may prefer a treatment because it has a better side-effect profile, it requires only one small pill a day instead of three, or it’s cheaper. They may prefer a treatment because of the way it is delivered-over the counter versus having to go to a doctor for a prescription. Treatments often have features that influence patient satisfaction, but for which you can’t get a single QALY’s credit for. In fact, the utility of your patients isn’t represented by the nonlinear dotted line but the solid nonlinear line that lies below it. That is, the disutility curve is both nonlinear and also steeper because it includes other variables that people care about but QALYs don’t include. You underestimate the QALY gain even further by using the straight line instead of the real utility of your patients.

What we just did in this diagram is what economists do in every area of applied economics, except health. It is puzzling to me why we have to have a special theory to do health economics instead of the theory that everyone else uses. I’m going to propose a more general approach here, but my suggestion really is quite trivial. When I show this to anyone but a health economist they say: So what? Of course. That’s just economics. This is “just economics” but it may look a little radical to people used to QALYs.

Suppose utility depends on things other than some narrow, clinical measure of health. It depends on the time spent in the health state and it varies non-linearly with time. Utility depends on your consumption of other things and the value of that consumption may depend on how healthy I am. I may not be able to consume such things as rock climbing all if I’m sick. It could also work the other way around. The amount of money I have may influence the utility I get out of different health states. For example, wealthier people may be able to pay for services that mitigate the impact of ill health on activities of daily living.

In addition, we can incorporate process factors. People care about how health is delivered, not just what is delivered. If they care at all, that ought to count in their utility. Then there are a myriad of individual characteristics, old people, young people, rich people, poor people, people with a long history of illness, people with a short history of illness, people who live in urban areas or rural areas, people who live in France or the United States, all of these things could matter. QALYs usually don’t let you adjust for any of those factors. I also want to avoid trying to get the equilibrating measure by requiring people to trade off the probability of immediate death or reductions in longevity. We don’t have to do that. Survey subjects resist trading off death or a reduction in longevity to get rid of three hours of headache. If they are unwilling to accept that tradeoff, then you can’t estimate a QALY weight.

Now let’s endow our subjects with a spell of ill health. It could be 30 minutes, 30 days, or 30 years. Depending on the illness, there is generally some therapeutically relevant duration over which this illness typically is treated. We probably don’t have to tinker around with what goes on at the end of people’s life. However, spells of good and ill health have to add up within a given time frame. An additional day of ill health has to reduce the number of possible days of good health in the period by one day.

We would like to be able to derive both money and time equivalences of ill health from the same underlying preferences, letting people have preferences over whatever is important to them. We replace the utility of a given amount of time in a better health state with the utility of time in a worse health state. Now calculate how much money we would have to take away from the patient to reduce their utility back to the same level they experienced in the worse health state.

Economists generally call this adjustment in income or wealth “willingness to pay.” I would like to officially apologize for not only the term “value of a statistical life,” but the term “willingness to pay.” These terms cause nothing but mischief. It would be fine if economists only talked to non-economists. But I can understand why non-economists are quite offended when they hear us talking this way. The American Economic Association ought to hire a marketing firm to rename all the things that we talk about so that they don’t make us look stupid and embarrassed in front of non-economists. What we mean to say, of course, is
that there is some reduction in consumption that would just cancel out the utility improvement from better health. Of course, this has nothing at all to do with willingness to do anything. So here is my non-profound idea: if we can make the adjustment with consumption, why not make the adjustment with time? I’m going to call this time adjustment “willingness to wait,” although it doesn’t have anything to do with willingness, either.

offended when they hear us talking this way. The American Economic Association ought to hire a marketing firm to rename all the things that we talk about so that they don’t make us look stupid and embarrassed in front of non-economists. What we mean to say, of course, is that there is some reduction in consumption that would just cancel out the utility improvement from better health. Of course, this has nothing at all to do with willingness to do anything. So here is my non-profound idea: if we can make the adjustment with consumption, why not make the adjustment with time? I’m going to call this time adjustment “willingness to wait,” although it doesn’t have anything to do with willingness, either.

offended when they hear us talking this way. The American Economic Association ought to hire a marketing firm to rename all the things that we talk about so that they don’t make us look stupid and embarrassed in front of non-economists. What we mean to say, of course, is that there is some reduction in consumption that would just cancel out the utility improvement from better health. Of course, this has nothing at all to do with willingness to do anything. So here is my non-profound idea: if we can make the adjustment with consumption, why not make the adjustment with time? I’m going to call this time adjustment “willingness to wait,” although it doesn’t have anything to do with willingness, either.

We actually did this survey before I gave any thought to QALYs and willingness to wait. We endowed people with a hypothetical initial acute condition of five days. In the example shown, they would be sick for five days with shortness of breath and swelling in ankles if they didn’t do anything. That is, they would experience an exacerbation of congestive heart failure that would put them in the hospital for five days. Someone is going to have to take care of them, but it’s covered by insurance and won’t cost them anything. For some amount of money, you could shorten duration to one day or avoid the condition all together. We asked subjects to evaluate a series of 10 such tradeoff tasks, adjusting and rotating the features of the conditions, and asked them in each case which of the conditions they preferred. The pattern of those choices reveals the underlying relative rates of substitution or relative importance weights that are attached to various features.

 Notice that they traded off both time and money at the same time, as we do in real life. We got disutility curves for various conditions just like the ones I showed you earlier. They are nonlinear and we can derive willingness to pay, willingness to wait, and a form of time equivalence that looks like a generalized QALY. I’ve calling this measure “super QALYs”, which may be a little arrogant since it’s such a simple idea. In any case, from exactly the same underlying preferences we can derive several kinds of equivalence measures for a given intervention. We don’t have use straight lines, we don’t have understate the utility loss of acute conditions, we don’t have to leave out things that are important to patients, and we don’t have to make patients trade off death when death isn’t a realistic disease endpoint.

Can we build a better measure of health utility? I’ve shown you how I think that could be done, but we could also try to fix what’s wrong with QALYs. I don’t think this is a great idea but, out of respect for Bernie’s memory, I can suggest a few things. There’s no reason why we have to insist that QALYs be linear in time. We could estimate QALYs for different time intervals. This is more or less Gafni’s Healthy Year Equivalence. We should really not attempt to extrapolate among chronic and acute conditions using the same weights. That is, QALY weights should pay attention to the context in which the QALY was originally measured, both the health state and the duration that was specified in either the time-tradeoff or the standard-gamble elicitation. We don’t let you extrapolate outside the range of your epidemiologic data, so why do we let you extrapolate outside the range of your QALY data? Incidentally, why don’t we ever see confidence intervals on those QALY weights? I don’t know why we have to suspend both economics and statistics in estimating QALYs. There’s also no reason why we couldn’t estimate QALY weights as a function of the characteristics of the respondents of the survey.

There’s also no reason, in principle, that we couldn’t include other factors that influence patient satisfaction. Unfortunately, such complications may make standard-gamble or a simple time-tradeoff surveys intractable. I’ve suggested a more valid and effective way of approaching the problem by constructing tradeoffs that look more like the tradeoffs people make in real healthcare settings. These hypothetical tradeoffs reveal the rates of substitution we need to calculate time or money equivalences of treatment benefits. I don’t have time to discuss the controversies related to contingent evaluation, or so-called “willingness to pay method”. In some ways conventional willingness-to-pay studies share many of the same problems as QALYs in not allowing for a full range of opportunities to trade off in various dimensions. Basically, we describe a treatment scenario and ask people what is the most they would pay for it. It doesn’t make sense to constructing tradeoffs that look more like the tradeoffs people make in real healthcare settings. These hypothetical tradeoffs reveal the rates of substitution we need to calculate time or money equivalences of treatment benefits.

 I don’t have time to discuss the controversies related to contingent evaluation, or so-called “willingness to pay method”. In some ways conventional willingness-to-pay studies share many of the same problems as QALYs in not allowing for a full range of opportunities to trade off in various dimensions. Basically, we describe a treatment scenario and ask people what is the most they would pay for it. It doesn’t make sense to

ask them how much they would be willing to pay because they are likely to say they aren’t willing to pay anything. Many researchers avoid that reaction and other problems by offering subjects a price of, say, £250 pounds and recording whether they would accept or reject the treatment at such a price. We get very little information about underlying preferences from such a question and have trouble generalizing the results to even slightly different situations. Finally, many people are offended by dollar values of health benefits, which, again, is why QALYs are so attractive.

Amiram Gaffney was an old friend and colleague of Bernie’s. He once asked a question that I think we all ought to ask ourselves. I’d like to end with his question: “When asking the public to assist in determining health priorities, we should use techniques that allow people to reveal their true preferences. If not, why bother asking them at all? [2]”

REFERENCES
1 Tengs T, Wallace A. One-thousand health-related quality of life estimates. Med Care 2000 38:583-637.
 2 Gafni A., Birch S. Preferences for outcomes in economic evaluation: An economic approach to addressing economic problems. Soc Sci Med 1995;40:767- 6.


  Issues Index | 2005 Issues Index

 

Contact ISPOR @ info@ispor.org  |  View Legal Disclaimer
©2010 International Society for Pharmacoeconomics and Outcomes Research.
All rights reserved under International and Pan-American Copyright Conventions.
 
Website design by Eagle Systems USA, Inc.