Instrumental Variable Treatment Effects Estimation: A Simulation Comparison of Instrumental Variables and OLS Approaches
Author(s)
Mohammad Sameer Mansoori, MSc, Akanksha Sharma, MSc, Neha Tripathi, MPH, Parampal Bajaj, BTech, Shubhram Pandey, MSc.
Heorlytics Pvt. Ltd., Mohali, India.
Heorlytics Pvt. Ltd., Mohali, India.
OBJECTIVES: Endogeneity frequently challenges causal inference in observational studies, as it exists when exposure is correlated with unmeasured factors that also affect the outcome. This results in biased estimates in Ordinary Least Squares (OLS) regression. Instrumental Variable (IV) estimators, like Two-Stage Least Squares (2SLS), overcome this problem by using an instrument, which is an external variable that influences the exposure but is not directly correlated with an outcome. The goal is to compare OLS and IV estimation methods for identifying causal effects in a simulated dataset when endogeneity is present.
METHODS: A synthesized dataset was analysed in R, with an exposure variable, a continuous outcome variable, and an instrumental variable. With the AER package, the impact of exposure on the result was computed using both OLS and IV within 2SLS. In the 2SLS procedure, the exposure is initially predicted with the instrument and then regressed on this predicted exposure. Applied the Hausman test to assess for endogeneity and the first-stage F-statistic to test for the strength of the instrument.
RESULTS: The OLS model regression coefficient was 2.68 (p < 0.001), whereas the IV model gave a smaller coefficient of 2.03 (p < 0.001), as the estimated average change in the continuous outcome with each one-unit increase in the exposure. The Hausman test (p < 0.05) indicates endogeneity in the OLS model. The first-stage F-statistic value of 33.7 indicates that the instrument was robust.
CONCLUSIONS: IV method provided better estimate than OLS under the presence of endogeneity, provided a strong instrument is available. This simulation study highlights the importance of addressing endogeneity to improve the credibility of causal inferences in observational research.
METHODS: A synthesized dataset was analysed in R, with an exposure variable, a continuous outcome variable, and an instrumental variable. With the AER package, the impact of exposure on the result was computed using both OLS and IV within 2SLS. In the 2SLS procedure, the exposure is initially predicted with the instrument and then regressed on this predicted exposure. Applied the Hausman test to assess for endogeneity and the first-stage F-statistic to test for the strength of the instrument.
RESULTS: The OLS model regression coefficient was 2.68 (p < 0.001), whereas the IV model gave a smaller coefficient of 2.03 (p < 0.001), as the estimated average change in the continuous outcome with each one-unit increase in the exposure. The Hausman test (p < 0.05) indicates endogeneity in the OLS model. The first-stage F-statistic value of 33.7 indicates that the instrument was robust.
CONCLUSIONS: IV method provided better estimate than OLS under the presence of endogeneity, provided a strong instrument is available. This simulation study highlights the importance of addressing endogeneity to improve the credibility of causal inferences in observational research.
Conference/Value in Health Info
2025-11, ISPOR Europe 2025, Glasgow, Scotland
Value in Health, Volume 28, Issue S2
Code
MSR132
Topic
Economic Evaluation, Health Technology Assessment, Methodological & Statistical Research
Topic Subcategory
Confounding, Selection Bias Correction, Causal Inference
Disease
No Additional Disease & Conditions/Specialized Treatment Areas