DOI QR코드

DOI QR Code

Model selection method for categorical data with non-response

무응답을 가지고 있는 범주형 자료에 대한 모형 선택 방법

  • Yoon, Yong-Hwa (Department of Statistics and Computer Science, Daegu University) ;
  • Choi, Bo-Seung (Department of Statistics and Computer Science, Daegu University)
  • 윤용화 (대구대학교 전산통계학과) ;
  • 최보승 (대구대학교 전산통계학과)
  • Received : 2012.04.23
  • Accepted : 2012.06.15
  • Published : 2012.07.31

Abstract

We consider a model estimation and model selection methods for the multi-way contingency table data with non-response or missing values. We also consider hierarchical Bayesian model in order to handle a boundary solution problem that can happen in the maximum likelihood estimation under non-ignorable non-response model and we deal with a model selection method to find the best model for the data. We utilized Bayes factors to handle model selection problem under Bayesian approach. We applied proposed method to the pre-election survey for the 2004 Korean National Assembly race. As a result, we got the non-ignorable non-response model was favored and the variable of voting intention was most suitable.

Acknowledgement

Supported by : 대구대학교

References

  1. Agresti, A. (2002). Categorical data analysis, second edition, John Wiley & Sons Inc., New Jersey.
  2. Baker, S. G. and Laird, N. M. (1988). Regression analysis for categorical variables with outcome subject to nonignorable nonresponse. Journal of the American Statistical Association, 83, 62-69. https://doi.org/10.1080/01621459.1988.10478565
  3. Baker, S. G., Rosenberger, W. F. and Dersimonian, R. (1992). Closed-form estimates for missing counts in two-way contingency tables. Statistics in Medicine, 11, 643-657. https://doi.org/10.1002/sim.4780110509
  4. Cargnoni, C., Miller, P. and West, M. (1997). Bayesian forecasting of multinomial time series through conditionally Gaussian dynamic models. Journal of the American Statistical Association, 92, 640-647.
  5. Chib, S. (1995). Marginal likelihood from the Gibbs output. Journal of the American Statistical Association, 90, 1313-1321. https://doi.org/10.1080/01621459.1995.10476635
  6. Chib, S. and Jeliazkov, I. (2001). Marginal likelihood from the Metropolis-Hastings output. Journal of the American Statistical Association, 96, 270-281. https://doi.org/10.1198/016214501750332848
  7. Chen, Q. L. and Stasny, E. A. (2003). Handling undecided voters: Using missing data methods in election forecasting, Technical Report, Department of Statistics, The Ohio State University.
  8. Choi, B. (2007). A study of customer segmentation method using nonresponse model. Journal of the Korean Data Analysis Society, 9, 1849-1860.
  9. Choi, B., Park, Y. S. and Lee, D. H. (2007). Election forecasting using pre-election survey data with nonignorable nonresponse. Journal of the Korean Data Analysis Society, 9, 2321-2333.
  10. Choi, B., Kim, D. Y., Kim, K. W. and Park, Y. S. (2008). Nonignorable nonresponse imputation and rotation group bias estimation on the rotation sample survey. The Korean Journal of Applied Statistics, 21, 361-375. https://doi.org/10.5351/KJAS.2008.21.3.361
  11. Choi, B., Choi, J. W. and Park, Y. S. (2009). Bayesian methods for an incomplete two-way contingency table with application to the Ohio(Buckeye state polls). Survey Methodology, 35, 37-51.
  12. Chun, Y. M., Son, H. K. and Chung, S. S. (2007). Treatment of missing data by decomposition and voting with ordinal data. Journal of the Korean Data & Information Science Society, 18, 585-598.
  13. Chung, H. C. and Han, C. P. (2009). Bootstrap confidence intervals for classification error rate in circular models when a block of observation is missing. Journal of the Korean Data & Information Science Society, 20, 757-764.
  14. Congdon, P. (2002). Bayesian statistical modelling, first edition, John Wiley & Sons Ltd., Chichester.
  15. Dempster, A. P., Laird, N. M. and Rubin, D. M. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society B, 39, 1-38.
  16. Gelfand, A. E. and Smith, A. F. M. (1990). Sampling based to calculating marginal densities, Journal of the American Statistical Association, 85, 398-409. https://doi.org/10.1080/01621459.1990.10476213
  17. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian data analysis, second edition, Chapman & Hall/CRC, Florida.
  18. Green, P. E. and Park, T. (2003). A Bayesian hierarchical model for categorical data with nonignorable nonresponse. Biometrics, 59, 886-896. https://doi.org/10.1111/j.0006-341X.2003.00103.x
  19. Hong, J. S. and Jung, M. H. (2011a). Undecided inference using logistic regression for credit evaluation. Journal of the Korean Data & Information Science Society, 22, 149-157.
  20. Hong, J. S. and Jung, M. S. (2011b). Undecided inference using bivariate probit models. Journal of the Korean Data & Information Science Society, 22, 1017-1028.
  21. Kass, R. E. and Raftery, E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773-795. https://doi.org/10.1080/01621459.1995.10476572
  22. Little, J. A. and Rubin, D. B. (2002). Statistical analysis with missing data, second edition, Wiley, New York.
  23. Park, T. (1998). An approach to categorical data with nonignorable nonresponse. Biometrics, 54, 1579-1690. https://doi.org/10.2307/2533682
  24. Park, T. and Brown, M. B. (1994). Models for categorical data with nonignorable nonresponse. Journal of the American Statistical Association, 89, 44-52. https://doi.org/10.1080/01621459.1994.10476444
  25. Park, Y. S. and Choi, B. (2010). Bayesian analysis for incomplete multi-way contingency tables with nonignorable nonresponse. Journal of Applied Statistics, 37, 1439-1453. https://doi.org/10.1080/02664760903046078
  26. Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling "Don't know" survey responses: The case of the Slovenian Plebiscite. Journal of the American Statistical Association, 90, 822-828.

Cited by

  1. A comparison study for accuracy of exit poll based on nonresponse model vol.25, pp.1, 2014, https://doi.org/10.7465/jkdi.2014.25.1.53
  2. Analysis of Missing Data Using an Empirical Bayesian Method vol.27, pp.6, 2014, https://doi.org/10.5351/KJAS.2014.27.6.1003
  3. A longitudinal study for child aggression with Korea Welfare Panel Study data vol.25, pp.6, 2014, https://doi.org/10.7465/jkdi.2014.25.6.1439
  4. An estimation method for non-response model using Monte-Carlo expectation-maximization algorithm vol.27, pp.3, 2016, https://doi.org/10.7465/jkdi.2016.27.3.587
  5. Handling the nonresponse in sample survey vol.23, pp.6, 2012, https://doi.org/10.7465/jkdi.2012.23.6.1183