Practice of causal inference with the propensity of being zero or one: assessing the effect of arbitrary cutoffs of propensity scores

Kang, Joseph;Chan, Wendy;Kim, Mi-Ok;Steiner, Peter M.;

doi:10.5351/CSAM.2016.23.1.001

Communications for Statistical Applications and Methods

Volume 23 Issue 1
/
Pages.1-20
/
2016
/
2287-7843(pISSN)
/
2383-4757(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Practice of causal inference with the propensity of being zero or one: assessing the effect of arbitrary cutoffs of propensity scores

Kang, Joseph (Department of Preventive Medicine, Northwestern University) ;
Chan, Wendy (Department of Statistics, Northwestern University) ;
Kim, Mi-Ok (Department of Pediatrics, University of Cincinnati and Cincinnati Children's Hospital Medical Center) ;
Steiner, Peter M. (Department of Educational Pscychology, University of Wisconsin)

Received : 2015.04.20
Accepted : 2015.11.26
Published : 2016.01.31

https://doi.org/10.5351/CSAM.2016.23.1.001 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Causal inference methodologies have been developed for the past decade to estimate the unconfounded effect of an exposure under several key assumptions. These assumptions include, but are not limited to, the stable unit treatment value assumption, the strong ignorability of treatment assignment assumption, and the assumption that propensity scores be bounded away from zero and one (the positivity assumption). Of these assumptions, the first two have received much attention in the literature. Yet the positivity assumption has been recently discussed in only a few papers. Propensity scores of zero or one are indicative of deterministic exposure so that causal effects cannot be defined for these subjects. Therefore, these subjects need to be removed because no comparable comparison groups can be found for such subjects. In this paper, using currently available causal inference methods, we evaluate the effect of arbitrary cutoffs in the distribution of propensity scores and the impact of those decisions on bias and efficiency. We propose a tree-based method that performs well in terms of bias reduction when the definition of positivity is based on a single confounder. This tree-based method can be easily implemented using the statistical software program, R. R code for the studies is available online.

Keywords

References

Austin PC (2011). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies, Pharmaceutical Statistics, 10, 150-161. https://doi.org/10.1002/pst.433
Biau G (2012). Analysis of a random forests model, Journal of Machine Learning Research, 1063-1095.
Breiman L (2001). Random forests, Machine learning, 45, 5-32. https://doi.org/10.1023/A:1010933404324
Breiman L, Friedman JH, Olshen RA, and Stone CJ (1984). Classification and Regression Trees, Wadsworth and Brooks.
Brumback BA, Hernan MA, Haneuse SJ, and Robins JM (2004). Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures, Stat Med, 23, 749-767. https://doi.org/10.1002/sim.1657
Cole SR and Frangakis CE (2009). The consistency statement in causal inference: a definition or an assumption?, Epidemiology, 20, 3-5. https://doi.org/10.1097/EDE.0b013e31818ef366
Cole SR and Hernan MA (2008). Constructing inverse probability weights for marginal structural models, American Journal of Epidemiology, 168, 656-664. https://doi.org/10.1093/aje/kwn164
Crump RK, Hotz VJ, Imbens GW, and Mitnik OA (2009). Dealing with limited overlap in estimation of average treatment effects, Biometrika, asn055.
Deconinck E, Hancock T, Coomans D, Massart D, and Vander Heyden Y (2005). Classification of drugs in absorption classes using the classification and regression trees (CART) methodology, Journal of Pharmaceutical and Biomedical Analysis, 39, 91-103. https://doi.org/10.1016/j.jpba.2005.03.008
Freund Y, Schapire R, and Abe N (1999). A short introduction to boosting, Journal-Japanese Society For Artificial Intelligence, 14, 1612.
Hastie T, Tibshirani R, and Friedman J (2001). The Elements of Statistical Learning, Springer Series in Statistics, Springer, New York.
Hong, G. (2010). Marginal mean weighting through stratification: adjustment for selection bias in multilevel data, Journal of Educational and Behavioral Statistics, 35, 499-531. https://doi.org/10.3102/1076998609359785
Horvitz D and Thompson D (1952). A Generalization of Sampling Without Replacement from a Finite Universe, Journal of the American Statistical Association, 47, 663-685. https://doi.org/10.1080/01621459.1952.10483446
Kang J, Su X, Hitsman B, Liu K, and Lloyd-Jones D (2012). Tree-structured analysis of treatment effects with large observational data, Journal of Applied Statistics, 39, 513-529. https://doi.org/10.1080/02664763.2011.602056
Kleiner A, Talwalkar A, Sarkar P, and Jordan M (2012). The big data bootstrap, arXiv preprint arXiv: 1206.6415.
Lin D, Psaty BM, and Kronmal R (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies, Biometrics, 948-963.
Little R and An H (2004). Robust likelihood-based analysis of multivariate data with missing values, Statistica Sinica, 14, 949-968.
McCaffrey DF, Ridgeway G, and Morral AR (2004). Propensity score estimation with boosted regres-sion for evaluating causal effects in observational studies, Psychological Methods, 9, 403-425. https://doi.org/10.1037/1082-989X.9.4.403
Petersen ML, Porter KE, Gruber S,Wang Y, and van der Laan MJ (2012). Diagnosing and responding to violations in the positivity assumption, Stat Methods Med Res, 21, 31-54. https://doi.org/10.1177/0962280210386207
Phillip SK (2001). The delete-a-group Jackknife, Journal of Official Statistics, 17, 521-526.
Robins J (1986). A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect, Math Modelling, 7, 1393-1512. https://doi.org/10.1016/0270-0255(86)90088-6
Robins JM, Hernan MA, and Brumbac B (2000). Marginal structural models and causal inference in epidemiology, Epidemiology, 11, 550-560. https://doi.org/10.1097/00001648-200009000-00011
Rosenbaum PR (2002). Observational Studies, Springer, New York.
Rosenbaum PR (2010). Observational Studies, 2nd Ed., Springer, New York.
Rosenbaum PR and Rubin DB (1983). The central role of the propensity score in observational studies for causal effects, Biometrika, 70, 41-55. https://doi.org/10.1093/biomet/70.1.41
Rubin DB (1974). Estimating causal effects of treatments in randomized and nonrandomized studies, Journal of educational Psychology, 66, 688. https://doi.org/10.1037/h0037350
Rubin DB (1976). Inference and missing data, Biometrika, 63, 581-592. https://doi.org/10.1093/biomet/63.3.581
Rubin DB (1977). Assignment to treatment group on the basis of a covariate, Journal of Educational and Behavioral Statistics, 2, 1-26. https://doi.org/10.3102/10769986002001001
Rubin DB (1980a). Discussion of paper by D. Basu, Journal of the American Statistical Association, 75, 591-593.
Rubin DB (1980b). Comment, Journal of the American Statistical Association, 75, 591-593.
Rubin DB (1986). Statistics and causal inference: comment: which ifs have causal answers, Journal of the American Statistical Association, 81, 961-962.
Rubin DB (2005). Causal inference using potential outcomes: design, modeling, decisions, Journal of the American Statistical Association, 100, 322-331. https://doi.org/10.1198/016214504000001880
Schafer JL and Kang J (2008). Average causal effects from nonrandomized studies: a practical guide and simulated example, Psychological Methods, 13, 279. https://doi.org/10.1037/a0014268
Shen C, Li X, Li L, and Were MC (2011). Sensitivity analysis for causal inference using inverse probability weighting, Biometrical Journal, 53, 822-837. https://doi.org/10.1002/bimj.201100042
Snowden JM, Rose S, and Mortimer KM (2011). Implementation of G-computation on a simulated data set: demonstration of a causal inference technique, American Journal of Epidemiology, 173, 731-738. https://doi.org/10.1093/aje/kwq472
Su X, Kang J, Fan JJ, Levine RA, and Yan X (2012). Facilitating score and causal inference trees for observational studies, Journal of Machine Learning Research, 13, 2955-2994.
Taubman SL, Robins JM, Mittleman MA, and Hernan MA (2009). Intervening on risk factors for coronary heart disease: an application of the parametric g-formula, International Journal of Epidemiology, 38, 1599-1611. https://doi.org/10.1093/ije/dyp192
van der Laan M (2013). Targeted maximum likelihood estimation, US Patent, 8,438,126.
Westreich D and Cole SR (2010). Invited commentary: positivity in practice, American Journal of Epidemiology, 171, 674-677. https://doi.org/10.1093/aje/kwp436
Westreich D, Cole SR, Young JG, Palella F, Tien PC, Kingsley L, Gange SJ, and Hernan MA (2012). The parametric g-formula to estimate the effect of highly active antiretroviral therapy on incident AIDS or death, Statistics in Medicine, 31, 2000-2009. https://doi.org/10.1002/sim.5316
Zhang H and Singer B (1999). Recursive Partitioning in the Health Sciences, Springer, New York.

Communications for Statistical Applications and Methods

Practice of causal inference with the propensity of being zero or one: assessing the effect of arbitrary cutoffs of propensity scores

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)