DOI QR코드

DOI QR Code

A study on 3-step complex data mining in society indicator survey

사회지표조사에서의 3단계 복합 데이터마이닝의 적용 방안

  • Cho, Kwang-Hyun (Department of Early Childhood Education, Changwon National University) ;
  • Park, Hee-Chang (Department of Statistics, Changwon National University)
  • Received : 2012.08.31
  • Accepted : 2012.09.23
  • Published : 2012.09.30

Abstract

Social indicator survey can identify the state of society as a whole. When we create a policy, social indicator survey can reflect the public opinion of the region. Social indicator survey is an important measure of social change. Social indicator survey has been conducted in many municipalities (Seoul, Incheon, Busan, Ulsan, Gyeongsangnamdo, etc.). But, the result of social indicator survey analysis is mainly the basic statistical analysis. In this study, we propose a new data mining methodology for effective analysis. We propose a 3-step complex data mining in society indicator survey. 3-step complex data mining uses three data mining method (intervening association rule, clustering, decision tree).

사회지표조사는 주민들이 생각하는 사회 상태를 총체적으로 파악할 수 있는 조사로서 다양한 시책 개발에 있어 지역의 여론을 반영할 수 있는 장점이 있다. 사회지표조사는 사회 변화를 알 수 있는 중요한 척도라고 할 수 있으며, 많은 지자체 (서울시, 인천시, 부산시, 울산시, 경상남도 등)에서 많은 예산과 시간을 들여 조사를 실시하고 있다. 그러나 조사에 대한 분석 결과가 기초통계분석 위주로 되어 있어 실제 사회지표조사 자료를 제대로 활용하고 있지 못하고 있는 실정이므로 데이터마이닝 등의 다양한 방법의 적용이 필요하다. 이에 본 논문에서는 사회지표조사의 효율적인 분석을 위하여 새로운 데이터마이닝 방법론을 제시하고자 한다. 본 논문에서는 매개연관성규칙, k-평균 군집분석, 의사결정나무를 순차적으로 적용하는 3단계 복합 데이터마이닝의 적용 방법을 제안하며, 이를 2010년에 조사된 경상남도 사회지표조사 자료에 적용하고자 한다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and regression trees, Wadsworth and books, California.
  3. Cho, K. H. and Park, H. C. (2011a). A study on decision tree creation using intervening variable. Journal of the Korean Data & Information Science Society, 22, 671-678.
  4. Cho, K. H. and Park, H. C. (2011b). A study on removal of unnecessary input variables using multiple external association rule. Journal of the Korean Data & Information Science Society, 22, 877-884.
  5. Cho, K. H. and Park, H. C. (2011c). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data Analysis Society, 22, 81-88.
  6. Cho, K. H. and Park, H. C. (2012a). A study on association rule creation by marginally conditional variables. Journal of the Korean Data & Information Science Society, 23, 121-129. https://doi.org/10.7465/jkdi.2012.23.1.121
  7. Cho, K. H. and Park, H. C. (2012b). A study on decision tree creation using marginally conditional variables. Journal of the Korean Data & Information Science Society, 23, 299-307. https://doi.org/10.7465/jkdi.2012.23.2.299
  8. Hartigan, J. A. (1975). Clustering algorithms, John Wiley & Sons, New York.
  9. MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, 1, 281-297.
  10. Park, H. C. (2011a). Proposition of negatively pure association rule threshold. Journal of the Korean Data & Information Science Society, 22, 179-188.
  11. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  12. Quinlan, J. R. (1993). C4.5 programs for machine learning, Morgan Kaufmann Publishers, San Francisco.

Cited by

  1. Network analysis and comparing citation index of statistics journals vol.25, pp.2, 2014, https://doi.org/10.7465/jkdi.2014.25.2.317
  2. Comparison of model selection criteria in graphical LASSO vol.25, pp.4, 2014, https://doi.org/10.7465/jkdi.2014.25.4.881
  3. Clickstream Big Data Mining for Demographics based Digital Marketing vol.22, pp.3, 2016, https://doi.org/10.13088/jiis.2016.22.3.143