DOI QR코드

DOI QR Code

An estimation method for non-response model using Monte-Carlo expectation-maximization algorithm

Monte-Carlo expectation-maximaization 방법을 이용한 무응답 모형 추정방법

  • Received : 2016.03.15
  • Accepted : 2016.04.20
  • Published : 2016.05.31

Abstract

In predicting an outcome of election using a variety of methods ahead of the election, non-response is one of the major issues. Therefore, to address the non-response issue, a variety of methods of non-response imputation may be employed, but the result of forecasting tend to vary according to methods. In this study, in order to improve electoral forecasts, we studied a model based method of non-response imputation attempting to apply the Monte Carlo Expectation Maximization (MCEM) algorithm, introduced by Wei and Tanner (1990). The MCEM algorithm using maximum likelihood estimates (MLEs) is applied to solve the boundary solution problem under the non-ignorable non-response mechanism. We performed the simulation studies to compare estimation performance among MCEM, maximum likelihood estimation, and Bayesian estimation method. The results of simulation studies showed that MCEM method can be a reasonable candidate for non-response model estimation. We also applied MCEM method to the Korean presidential election exit poll data of 2012 and investigated prediction performance using modified within precinct error (MWPE) criterion (Bautista et al., 2007).

각종 선거를 앞두고 여러 여론조사 기관들은 다양한 방법으로 선거 결과를 예측한다. 조사를 통한 선거 예측을 수행하는 데 있어서 발생할 수 있는 문제점 중 하나는 무응답이며 무응답 대체 방법에 따라 예측 결과는 완전히 다른 결과를 생산해 낼 수 있다. 본 연구에서는 무응답 대체의 방법으로 모형을 기반으로 한 대체 방법에 대하여 연구하였다. 특히, 최대 우도 추정 방법을 적용했을 때 무시할 수 없는 무응답 (non-ignorable non-response) 체계 하에서 발생할 수 있는 변방 값 문제를 해결하기 위해 Wei와 Tanner (1990)가 제안한 Monte Carlo EM 알고리즘을 적용하였다. 모의 실험을 통하여 MCEM 방법과 기존의 최대 우도 추정 방법, 베이지안 추정 방법 사이의 비교 연구를 진행하였고 그 결과 MCEM 방법이 기존 방법들에 대한 대안 방법으로 이용될 수 있음을 보였다. 또한 2012년에 시행된 제18대 대통령 선거 당일의 출구조사 자료를 적용하여 실증 분석을 수행하였다. 예측 결과를 비교하기 위해 Bautista 등 (2007)이 제안한 MWPE (modified within precinct error)를 이용하였다.

Keywords

References

  1. Baker, S. G. and Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of American Statistical Association, 83, 62-69. https://doi.org/10.1080/01621459.1988.10478565
  2. Bautista, R., Callegaro, M., Vera, J. A. and Abundis, F. (2007). Studying nonresponse in mexican exit pollsm. international Journal of Public Opinion Research, 19, 492-503. https://doi.org/10.1093/ijpor/edm013
  3. Cho, Y. S., Chun, Y. M. and Hwang. D. Y. (2008). An imputation for nonresponses in the survey on the rural living indicators. Korean Journal of Applied Statistics, 21, 95-107. https://doi.org/10.5351/KJAS.2008.21.1.095
  4. Choi, B., Choi, J. W. and Park, Y. S. (2009). Bayesian methods for an incomplete two-way contingency table with application to the Ohio (Buckeye state polls). Survey Methodology, 35, 37-51.
  5. Choi, B. and Kim, K. M. (2012). A model selection method using em algorithm for missing data. Journal of the Korean Data Analysis Society, 14, 767-779.
  6. Choi, B., Park, Y. S. and LEE, D. H. (2007). Election forecasting using pre-election survey data with nonignorable nonresponse. Journal of the Korean Data Analysis Society, 9, 2321-2333.
  7. Crespi, I. (1988). Pre-election polling: Sources of accuracy and error, Russel Sage, New York.
  8. Dahinden, C., Kalisch, M. and Buhlmann, P. (2010).Decomposition and model selection for large contingency tables. Biometrical Journal, 52, 233-252.
  9. Hong, N. R. and Huh, M. H. (2001). A post-examination of forecasting surveys for the 16th general election. Survey Research, 2, 1-35.
  10. Ibrahim, J. G., Zhu, H. and Tang, N. (2008). Model selection criteria for missing-data problems using the EM algorithm. Journal of American Statistical Association, 103, 1648-1658. https://doi.org/10.1198/016214508000001057
  11. Kim, Y. W. and Kwak, E. S. (2010). A total survey error analysis of the exit polling for general election 2008 in Korea. Survey Research, 11, 33-55.
  12. Kwak, E. S., Kim, J. Y. and Kim, Y. W. (2013). Analysis of forecasting error of the exit poll for the general election of 2012 in Korea. Survey Research, 14, 1-7.
  13. Kwak, J. A. and Choi, B. (2014). A comparison study for accuracy of exit poll based on nonresponse model. Journal of the Korean Data & Information Science Society, 25, 53-64. https://doi.org/10.7465/jkdi.2014.25.1.53
  14. Lee, H. J. and Kang, S. B. (2012). Handling the nonresponse in sample survey. Journal of the Korean Data & Information Science Society, 23, 1183-1194. https://doi.org/10.7465/jkdi.2012.23.6.1183
  15. Lee, J. H., Kim. J. and Lee, K. J. (2006). Missing imputation methods using the spatial variable in sample survey. Korean Journal of Applied Statistics, 19, 57-67. https://doi.org/10.5351/KJAS.2006.19.1.057
  16. Little, J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, second edition, Wiley, New York.
  17. Nardi, Y. and Rinaldo, A. (2012). The log-linear group-lasso estimator and its aymptotic properties. Bernoulli , 13, 945-974.
  18. Park, T. (1998). An approach to categorical data with nonignorable nonresponse. Biometrics, 54, 1579-1690. https://doi.org/10.2307/2533682
  19. Park, T. and Brown, M. B. (1994). Models for categorical data with nonignorable nonresponse. Journal of American Statistical Association, 89, 44-52. https://doi.org/10.1080/01621459.1994.10476444
  20. Park, T. S. and Lee, S. Y. (1998). General research papers : analysis of categorical data with nonresponses. Korean Journal of Applied Statistics, 11, 83-95.
  21. Park, Y. S. and Choi, B. (2010). Bayesian analysis for incomplete multi-way contingency tables with nonignorable nonresponse. Journal of Applied Statistics, 37, 1439-1453. https://doi.org/10.1080/02664760903046078
  22. Park, Y. S., Kim, K. W. and Choi, B. (2013). Dynamic Bayesian analysis for irregularly and incompletely observed contingency tables. Journal of the Korean Statistical Society, 42, 277-289. https://doi.org/10.1016/j.jkss.2012.08.008
  23. Shim, M. S. and Choi, H. C. (1997). Studies on non-response cases of election polls. The Journal of Communication Science, 14, 137-162.
  24. Wei, G. C. G. and Tanner, M. A. (1990). A Monte Carlo implementation of the EM algorithm and the poor man's data augmentation algorithms. Journal of American Statistical Association, 85, 699-704. https://doi.org/10.1080/01621459.1990.10474930
  25. Yoo, H. S. (2015). A model selection method for non-response model based on empirical Bayesian method, Master Thesis, Daegu University, Gyeongbuk.
  26. Yoon, Y. H. and Choi, B. (2012). Model selection method for categorical data with non-response. Journal of the Korean Data & Information Science Society, 23, 627-641. https://doi.org/10.7465/jkdi.2012.23.4.627
  27. Yoon, Y. H. and Choi, B. (2014). Analysis of missing data using an empirical Bayesian method. Korean Journal of Applied Statistics, 27, 1003-1016. https://doi.org/10.5351/KJAS.2014.27.6.1003

Cited by

  1. Variance estimation for distribution rate in stratified cluster sampling with missing values vol.28, pp.2, 2017, https://doi.org/10.7465/jkdi.2017.28.2.443