DOI QR코드

DOI QR Code

Exploring modern machine learning methods to improve causal-effect estimation

  • Kim, Yeji (Department of Statistics, Korea University) ;
  • Choi, Taehwa (Department of Statistics, Korea University) ;
  • Choi, Sangbum (Department of Statistics, Korea University)
  • 투고 : 2021.08.14
  • 심사 : 2021.11.25
  • 발행 : 2022.03.31

초록

This paper addresses the use of machine learning methods for causal estimation of treatment effects from observational data. Even though conducting randomized experimental trials is a gold standard to reveal potential causal relationships, observational study is another rich source for investigation of exposure effects, for example, in the research of comparative effectiveness and safety of treatments, where the causal effect can be identified if covariates contain all confounding variables. In this context, statistical regression models for the expected outcome and the probability of treatment are often imposed, which can be combined in a clever way to yield more efficient and robust causal estimators. Recently, targeted maximum likelihood estimation and causal random forest is proposed and extensively studied for the use of data-adaptive regression in estimation of causal inference parameters. Machine learning methods are a natural choice in these settings to improve the quality of the final estimate of the treatment effect. We explore how we can adapt the design and training of several machine learning algorithms for causal inference and study their finite-sample performance through simulation experiments under various scenarios. Application to the percutaneous coronary intervention (PCI) data shows that these adaptations can improve simple linear regression-based methods.

키워드

참고문헌

  1. Athey S and Imbens G (2016). Recursive partitioning for heterogeneous causal effects. In Proceedings of the National Academy of Sciences, 113, 7353-7360. https://doi.org/10.1073/pnas.1510489113
  2. Athey S, Tibshirani J, and Wager S (2019). Generalized random forest, The Annals of Statistics, 47, 1148-1178. https://doi.org/10.1214/18-aos1709
  3. Austin PC (2011). An introduction to propensity score methods for reducing the effects of confounding in observational studies, Multivariate Behavioral Research, 46, 399-424. https://doi.org/10.1080/00273171.2011.568786
  4. Breiman L (2001). Random forest, Machine Learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
  5. Choi S, Choi T, Lee HY, Han SW, and Bandyopadhyay D (2021). Double-robust inferences for difference in restricted mean lifetimes using pseudo-observations, Submitted.
  6. Funk MJ, Westreich D, Wiesen C, Sturmer T, Brookhart MA, and Davidian M (2011). Doubly robust estimation of causal effects, The American Journal of Epidemiology, 173, 761-767. https://doi.org/10.1093/aje/kwq439
  7. Gruber S, and van der Laan MJ (2010). A targeted maximum likelihood estimator of a causal effect on a bounded continuous outcome, The International Journal of Biostatistics, 6, 1557-4679.
  8. Gulen H, Jens C, and Page TB (2020). An Application of Causal Forest in Corporate Finance: How- Does Financing Affect Investment?, Microeconomics: Intertemporal Firm Choice & Growth.
  9. Helwig NE (2020). Multiple and Generalized Nonparametric Regression, SAGE Publications Limited.
  10. Hernan MA and Robins JM (2020). Causal Inference: What If, CRC Boca Raton, Florida.
  11. Kang JDY and Schafer JL (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data, Statistical Science, 22, 523-539. https://doi.org/10.1214/07-STS227
  12. Lee BK, Lessler J, and Stuart EA (2010). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data, Statistical Science, 22, 523-539. https://doi.org/10.1214/07-STS227
  13. Li X and Shen C (2020). Doubly robust estimation of causal effect: upping the odds of getting the right answers, Circulation: Cardiovascular Quality and Outcomes, 13, e006065. https://doi.org/10.1161/CIRCOUTCOMES.119.006065
  14. McCaffrey DF, Griffin BA, Almirall D, Slaughter ME, Ramchand R, and Burgette LF (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models, Statistics in Medicine, 32, 3388-3414. https://doi.org/10.1002/sim.5753
  15. Robins JM, Rotnitzky A, and Zhao LP (1994). Estimation of regression coeffcients when some regressors are not always observed, Journal of the American statistical Association, 89, 846-866. https://doi.org/10.1080/01621459.1994.10476818
  16. Robins JM, Sued M, Lei GQ, and Rotnitzky A (2007). Comment: Performance of double-robust estimators when inverse probability weights are highly variable, Statistical Science, 22, 544-559. https://doi.org/10.1214/07-STS227D
  17. Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41
  18. Rubin D (1978). Bayesian inference for causal effects: The role of randomization, The Annals of Statistics, 6, 34-58. https://doi.org/10.1214/aos/1176344064
  19. Schuler MS and Rose S (2017). Targeted maximum likelihood estimation for causal inference in observational studies, American Journal of Epidemiology, 185, 65-73. https://doi.org/10.1093/aje/kww165
  20. Stuart EA (2010). Matching methods for causal inference: A review and a look forward, Statistical Science, 25, 1-21. https://doi.org/10.1214/09-STS313
  21. Tsiatis A (2007). Semiparametric Theory and Missing Data, Springer, New York.
  22. Van der Laan MJ, Polley EC, and Hubbard AE (2007). Super learner, Statistical Applications in Genetics and Molecular Biology, 6, 1544-6115.
  23. Van der Laan MJ and Rose S (2011). Targeted Learning: Causal Inference for Observational and Experimental Data, Springer, California.
  24. Wager S and Athey S (2018). Estimation and inference of heterogeneous treatment effects using random forest, Journal of the American Statistical Association, 113, 1228-1242. https://doi.org/10.1080/01621459.2017.1319839