Causal Inference in HEOR: Making Complex Decisions in a World of Imperfect Data
According to Judea Pearl, PhD, Professor, Computer Science and Director, Cognitive Systems Laboratory, Samueli School of Engineering, University of California, Los Angeles, USA, even a 3-year-old has a remarkable understanding of causation.
As he explains in the first chapter of his book, The Book of Why,1 humans’ ability to reason retrospectively, imagine roads not taken, and compare the observed world with counterfactual alternatives, is something that even the most sophisticated artificial intelligence neural networks have not yet been able to achieve. However, he posits that there are ways machines and people “can represent causal knowledge in a way that would enable them to access the necessary information swiftly, answer questions correctly, and do it with ease, as a 3-year-old child can.”
“Machine learning amplifies one little corner of human ability and this is to handle data, to store it, to collect it, to retrieve it, to answer questions about associations, to summarize data properly, to visualize data—all this is fine,” Pearl says. “But the hard questions of causal thinking cannot be answered by machine learning alone, these must be handled by a smart symbiosis of causal models and machine learning. Whenever you do a causal inference exercise you get an answer that tells you where machine learning can be of help and how, so you can adequately divide the labor.”
Pearl’s causal metamodel is the “Ladder of Causation,” which comprises 3 parts: the lowest level, Association (seeing/observing), entails the sensing of regularities or patterns in the input data, expressed as correlations. The middle level, Intervention (doing), predicts the effects of deliberate actions, expressed as causal relationships. The highest level, Counterfactuals (imagining), involves constructing a theory of the world that explains why specific actions have specific effects and what would have happened had those actions been different.
Causal Models and Healthcare
One industry that generates a lot of data is healthcare. According to RBC Capital Markets, 30% of the world’s data volume is being generated by the healthcare industry and by 2025 the compound annual growth rate of data for healthcare will reach 36%.
“Sorting through all of these data to derive information from them—especially in health economics and outcomes research (HEOR), in which much of the work is related to guiding patient-centered medical decision making and public health policy decisions—has to start with causal questions, using causal assumptions, and developing decision-analytic models,” says Uwe Siebert, MD, MPH, MSc, ScD, UMIT - University for Health Sciences, Medical Informatics and Technology, Hall in Tirol, Austria, and Harvard Chan School of Public Health in Boston, MA, USA.
Causal models are needed because often it is impossible to run real-time experiments assessing long-term consequences that affect human individuals and populations. “There are of course, limitations and strict ethical rules about performing experimental clinical studies,” Siebert says, adding that trying to run experiments on patient-relevant outcomes in real time is also problematic, especially in an ever-shifting environment such as the COVID-19 pandemic. “What if we treat, what if we don’t treat, what if we start treatment early or start late? What if you wear masks, do COVID tests, or close schools? And what if not? We likely can’t run experiments for all these decisions because by the time we get the results, it may already be too late for many of these decisions.”
Siebert says another reason why causal inference is important in health decision science, and HEOR especially, is the fact that we live in a world with imperfect data, but decisions must still be made—with the goal of gathering further evidence to improve these decisions. The causal diagrams developed by Pearl and others combined with evidence-based causal decision analysis allow temporary decisions to be made based on the best evidence available at a given time, and more data can be filled in later once additional evidence is generated. “In health economics, we have a formal framework called value-of-information analysis that guides the efficient collection of further evidence and tells us when evidence is enough,” Siebert says.
The Current State of Causal Modeling in HEOR
Although causal modeling has been around for decades, its penetration into healthcare and HEOR has been slow. As Siebert explains, the principal concept of causal pathways was introduced by the biologist Sewall Wright in 1921 and was forgotten until Pearl and his colleagues in the 1980s developed a complete mathematical concept for causal diagrams. In 1999, causal diagrams were introduced to epidemiology and health sciences in a pivotal paper, “Causal Diagrams for Epidemiologic Research.”2
One of the authors, Harvard Professor James Robins’ causal computation method, the “g-formula,” had been developed in 1986,3 but it was almost 15 years later when Robins asked his then-doctoral candidate, Siebert, to apply this method to real data. Siebert published the first application of the parametric g-formula in a medical decision-making conference proceeding in 2002.4 It took a decade more (2012) for the pharmaceutical industry to become aware of g-methods, when g-methods were successfully used in health technology assessment (HTA) in the United Kingdom to adjust clinical trial data for treatment switching—and the drugs under investigation received reimbursement.
Anecdotally, Pearl and Robins have translated the g-formula into a graph-based sequential back-door formula, so that it could serve researchers who find graphs a convenient way of conveying scientific knowledge.5
Pearl believes that one of the reasons that causal thinking has not gained more ground as it definitely should in some sciences is a difficulty with language. “The language of causal thinking is not being taught in school. In Statistics 101, you wouldn’t even be allowed to say the word `cause,’” he says. “The textbooks, they warn you against stating causal assumptions. Or look in the index of every textbook in statistics, you wouldn’t find ‘causal effect’ there, or any notion that is inherently causal.”
Students coming from a statistics background believe that statistics is the language of science. “Means, variance, regression coefficients, confidence intervals, testing of hypotheses, things of that sort—this is what they take to be the language of science. But it is not!” Pearl says. “Science speaks cause and effect. And it takes generations to undo this deeply entrenched paradigm.”
Today, according to Siebert, one of the main tasks in applying causal inference methods in HEOR and HTA is understanding which analytical method works best for which type of research question, and recommending what additional evidence should be generated. This will take time, although epidemiology, which is related to HEOR, has developed methods for causal data analysis that can be adopted.6,7
“In any science, not just in medicine and health science, it may take decades after some knowledge has been generated or created, or a particular method has been developed, until it is known in the broader community in a field,” Siebert says. “I’m now old enough to be able to testify that this is definitely true in health sciences, including HEOR. And once the methods are known, it may take another 1 to 2 decades to try them out in routine settings, and move up the learning curve until we are experts—and we’re not there yet. This is true for many clinical procedures and health technologies, but it’s also true for analytic methods. Causal methods must be applied to the real, routine, imperfect, ‘dirty’ data—a nicer term is ‘real-world evidence’—to gain experience with them and understand the strengths and limitations and when we should use them and when not, and when we can base decisions on the data available and when we still must stick to experiments such as randomized controlled trials.”
Pearl adds that, even if HEOR and HTA use randomized controlled experiments as the gold standard study design, the tools that are currently emerging from causal inference promise to revolutionize the industry. Examples are tools for recovering from ‘sample selection bias’, coherent aggregation of findings from several heterogenous trials, and, most excitingly, methods of informing personalized decision making.8 “Truly personalized medicine, I dare say, is much closer to reality than most researchers imagine,” said Pearl.
Although it has been decades since Robins’ and Pearl’s groundbreaking papers, Siebert believes that causal modeling in HTA and HEOR will progress faster if there are others willing to be trailblazers and apply the theories to their work. He says the support of ISPOR is crucial. “We are still on the steep part of the learning curve increasing our experience with each application. I think this is now our most important job and we should work together across scientific disciplines. There’s a lot to be done.”
Pearl presented basics about causal inference, moderated by Siebert, in ISPOR’s January Signal episode, “The New Science of Cause and Effect: Causal Revolution Applied.”
The ISPOR Signal Series
ISPOR started the Signal program to bring a broader understanding of innovation (beyond product innovation), with the goal of putting these issues front and center for the HEOR community. Each episode in the series is a self-contained installment and not dependent on the previous episodes; however, all of them are connected by an intent to look at the concept of innovation and experience with it from different groups of healthcare stakeholders, building foresight into how these innovations might impact healthcare decision making in the next decade.
“The next installment in the Signal series, “New Analytical Approaches to 21st Century Challenges,” will be May 16. This episode will focus on envisioning and discussing the approaches needed to analyze the behaviors that are generated by the myriad interactions of billions of people at timescales ranging from nanoseconds (as in computer trading) to millennia (as in evolution). We will cover this episode more in-depth in a future issue of Value & Outcomes Spotlight.
Read more about past Signal events in Value & Outcomes Spotlight
• ISPOR Generates a Signal for Transmitting Innovation
• From Measuring Costs to Measuring Outcomes: Revamping Healthcare at a System Level
• Beyond Cost-Effectiveness: Defining and Mapping Out Innovation at NICE
For more information and to register
About the author
Christiane Truelove is a freelance medical writer based in Bristol, PA.
1. Pearl J, Mackenzie D. The Book of Why: The New Science of Cause and Effect. New York, NY: Basic Books. 2018.
2. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology. 1999;10(1):37-48.
3. Robins J. A new approach to causal inference in mortality studies with a sustained exposure period—application to control of the healthy worker survivor effect. Math Model. 1986;7(9–12):1393-1512.
4. Siebert U, Hernán MA, Robins JM. Monte Carlo simulation of the direct and indirect impact of risk factor interventions on coronary heart disease. An application of the g-formula. Proceedings of the 8th Biennial Conference of the European Society for Medical Decision Making. Taormina, Sicily, Italy. June 2-5, 2002:p51.
5. Pearl J, Robins JM. Probabilistic evaluation of sequential plans from causal models with hidden variables. In: Besnard P, Hank S, eds. Uncertainty in Artificial Intelligence: Proceedings of the Eleventh Conference. San Francisco, CA: Morgan Kaufmann; 1995:444-453.
6. Hernán MA, Robins JM. Causal Inference: What If. Boca Raton, FL: Chapman & Hall/CRC. 2020.
7. Pearl J, Glymour M, Jewell NP. Causal Inference in Statistics: A Primer. New York, NY: Wiley. 2018.
8. Mueller S, Pearl J. Personalized Decision Making—A Conceptual Introduction. UCLA Cognitive Systems Laboratory. Technical Report (R-513). Published March 2022.