DOI QR코드

DOI QR Code

Model selection method for categorical data with non-response

무응답을 가지고 있는 범주형 자료에 대한 모형 선택 방법

  • Yoon, Yong-Hwa (Department of Statistics and Computer Science, Daegu University) ;
  • Choi, Bo-Seung (Department of Statistics and Computer Science, Daegu University)
  • Received : 2012.04.23
  • Accepted : 2012.06.15
  • Published : 2012.07.31

Abstract

We consider a model estimation and model selection methods for the multi-way contingency table data with non-response or missing values. We also consider hierarchical Bayesian model in order to handle a boundary solution problem that can happen in the maximum likelihood estimation under non-ignorable non-response model and we deal with a model selection method to find the best model for the data. We utilized Bayes factors to handle model selection problem under Bayesian approach. We applied proposed method to the pre-election survey for the 2004 Korean National Assembly race. As a result, we got the non-ignorable non-response model was favored and the variable of voting intention was most suitable.

본 연구는 다차원 분할표 형태로 정리된 범주형 자료가 결측치나 무응답을 가지고 있을 때 주어진 자료를 가장 잘 설명하고 예측의 정확도를 높일 수 있는 모형의 추정과 모형의 선택 문제를 다루었다. 무시할 수 없는 무응답 (non-ignorable non-response)체계하에서 최대우도 추정에서 발생할 수 있는 변방값 문제를 해결하기 위하여 계층적 베이지안 모형을 고려하였다. 또한 모형 적도를 높이기 위한 변수 조합을 찾는 모형 선택의 문제를 함께 다루었다. 베이지안 접근하에서 모형 선택의 문제를 다루기 위하여 베이즈 인자 (Bayes factor)를 모형 선택의 기준으로 이용하였다. 제시된 방법은 2004년 실시된 우리나라 국회의원 선거를 앞두고 수행된 여론조사 데이터를 이용하여 실증분석을 수행하였다. 분석결과 무시할 수 없는 무응답 체계하에서 설명변수로 투표참여여부를 이용하는 것이 가장 적합한 모형으로 판명되었다.

Keywords

References

  1. Agresti, A. (2002). Categorical data analysis, second edition, John Wiley & Sons Inc., New Jersey.
  2. Baker, S. G. and Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association, 83, 62-69. https://doi.org/10.1080/01621459.1988.10478565
  3. Baker, S. G., Rosenberger, W. F. and Dersimonian, R. (1992). Closed-form estimates for missing counts in two-way contingency tables. Statistics in Medicine, 11, 643-657. https://doi.org/10.1002/sim.4780110509
  4. Cargnoni, C., Miller, P. and West, M. (1997). Bayesian forecasting of multinomial time series through conditionally Gaussian dynamic models. Journal of the American Statistical Association, 92, 640-647.
  5. Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90, 1313-1321. https://doi.org/10.1080/01621459.1995.10476635
  6. Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96, 270-281. https://doi.org/10.1198/016214501750332848
  7. Chen, Q. L. and Stasny, E. A. (2003). Handling undecided voters: Using missing data methods in election forecasting, Technical Report, Department of Statistics, The Ohio State University.
  8. Choi, B. (2007). A study of customer segmentation method using nonresponse model. Journal of the Korean Data Analysis Society, 9, 1849-1860.
  9. Choi, B., Park, Y. S. and Lee, D. H. (2007). Election forecasting using pre-election survey data with nonignorable nonresponse. Journal of the Korean Data Analysis Society, 9, 2321-2333.
  10. Choi, B., Kim, D. Y., Kim, K. W. and Park, Y. S. (2008). Nonignorable nonresponse imputation and rotation group bias estimation on the rotation sample survey. The Korean Journal of Applied Statistics, 21, 361-375. https://doi.org/10.5351/KJAS.2008.21.3.361
  11. Choi, B., Choi, J. W. and Park, Y. S. (2009). Bayesian methods for an incomplete two-way contingency table with application to the Ohio(Buckeye state polls). Survey Methodology, 35, 37-51.
  12. Chun, Y. M., Son, H. K. and Chung, S. S. (2007). Treatment of missing data by decomposition and voting with ordinal data. Journal of the Korean Data & Information Science Society, 18, 585-598.
  13. Chung, H. C. and Han, C. P. (2009). Bootstrap confidence intervals for classification error rate in circular models when a block of observation is missing. Journal of the Korean Data & Information Science Society, 20, 757-764.
  14. Congdon, P. (2002). Bayesian statistical modelling, first edition, John Wiley & Sons Ltd., Chichester.
  15. Dempster, A. P., Laird, N. M. and Rubin, D. M. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1-38.
  16. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling based to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409. https://doi.org/10.1080/01621459.1990.10476213
  17. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian data analysis, second edition, Chapman & Hall/CRC, Florida.
  18. Green, P. E. and Park, T. (2003). A Bayesian hierarchical model for categorical data with nonignorable nonresponse. Biometrics, 59, 886-896. https://doi.org/10.1111/j.0006-341X.2003.00103.x
  19. Hong, J. S. and Jung, M. H. (2011a). Undecided inference using logistic regression for credit evaluation. Journal of the Korean Data & Information Science Society, 22, 149-157.
  20. Hong, J. S. and Jung, M. S. (2011b). Undecided inference using bivariate probit models. Journal of the Korean Data & Information Science Society, 22, 1017-1028.
  21. Kass, R. E. and Raftery, E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773-795. https://doi.org/10.1080/01621459.1995.10476572
  22. Little, J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, second edition, Wiley, New York.
  23. Park, T. (1998). An approach to categorical data with nonignorable nonresponse. Biometrics, 54, 1579-1690. https://doi.org/10.2307/2533682
  24. Park, T. and Brown, M. B. (1994). Models for categorical data with nonignorable nonresponse. Journal of the American Statistical Association, 89, 44-52. https://doi.org/10.1080/01621459.1994.10476444
  25. Park, Y. S. and Choi, B. (2010). Bayesian analysis for incomplete multi-way contingency tables with nonignorable nonresponse. Journal of Applied Statistics, 37, 1439-1453. https://doi.org/10.1080/02664760903046078
  26. Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling "Don't know" survey responses: The case of the Slovenian Plebiscite. Journal of the American Statistical Association, 90, 822-828.

Cited by

  1. A comparison study for accuracy of exit poll based on nonresponse model vol.25, pp.1, 2014, https://doi.org/10.7465/jkdi.2014.25.1.53
  2. Analysis of Missing Data Using an Empirical Bayesian Method vol.27, pp.6, 2014, https://doi.org/10.5351/KJAS.2014.27.6.1003
  3. A longitudinal study for child aggression with Korea Welfare Panel Study data vol.25, pp.6, 2014, https://doi.org/10.7465/jkdi.2014.25.6.1439
  4. An estimation method for non-response model using Monte-Carlo expectation-maximization algorithm vol.27, pp.3, 2016, https://doi.org/10.7465/jkdi.2016.27.3.587
  5. Handling the nonresponse in sample survey vol.23, pp.6, 2012, https://doi.org/10.7465/jkdi.2012.23.6.1183
  6. 경시적 자료를 이용한 아동 학업성취도 분석 vol.28, pp.1, 2017, https://doi.org/10.7465/jkdi.2017.28.1.1