Analysis of Large Tables

Choi, Hyun-Jip;

doi:10.5351/KJAS.2005.18.2.395

The Korean Journal of Applied Statistics (응용통계연구)

Volume 18 Issue 2
/
Pages.395-410
/
2005
/
1225-066X(pISSN)
/
2383-5818(eISSN)

The Korean Statistical Society (한국통계학회)

DOI QR Code

Analysis of Large Tables

대규모 분할표 분석

Choi, Hyun-Jip (Department of Applied Information Statistics, Kyonggi University)

최현집 (경기대학교 경제학부 응용정보통계전공)

Published : 2005.07.01

https://doi.org/10.5351/KJAS.2005.18.2.395 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

For the analysis of large tables formed by many categorical variables, we suggest a method to group the variables into several disjoint groups in which the variables are completely associated within the groups. We use a simple function of Kullback-Leibler divergence as a similarity measure to find the groups. Since the groups are complete hierarchical sets, we can identify the association structure of the large tables by the marginal log-linear models. Examples are introduced to illustrate the suggested method.

많은 수의 범주형 변수에 의한 대규모 분할표 분석을 위하여 차원축소(collapsibility) 성질을 이용한 분석 방법을 제안하였다. kullback-Leibler의 발산 측도(divergence measure)를 이용한 서로 완전한 연관을 갖는 변수그룹을 결정하는 방법을 제안하였으며, 제안된 방법에 의한 변수그룹은 주변 로그선형모형(marginal log-linear models)에 의하여 변수들간의 연관성을 식별할 수 있다. 제안된 방법의 적용 예로 데이터 마이닝에서 흔히 접할 수 있는 대규모 분할표 자료인 소비자들의 구매행위 분석을 위한 장바구니 자료의 분석 결과를 제시하였다.

Keywords

References

Agresti, A., Lipsitz, S., and Lang, J. B. (1992). Comparing marginal distributions of large, sparse contingency tables, Computational Statistics & Data Analysis, 14, 55-73 https://doi.org/10.1016/0167-9473(92)90081-P
Bergsma, W. P. and Rudas, T. (2002). Marginal models for categorical data, Annals of Statistics, 30, 140-159 https://doi.org/10.1214/aos/1015362188
Christensen, R. (1997). Log-Linear Models and Logistic Regression 2nd, Springer-Verlag
DuMouchel, W. (1999). Bayesian data mining in large frequency tables, with an application to the FDA spontaneous reporting system, The American Statistician, 53, 177-190 https://doi.org/10.2307/2686093
Edwards, D. (2000). Introduction to Graphical Modelling, Springer-Verlag
Erosheva, E. A., Fienberg, S. E., and Junker, B. W. (2002). Alternative statistical models and representations for large sparse multi-dimensional contingency tables, Annales de la Faculte de Sciences de Toulouse, 11, 485-505 https://doi.org/10.5802/afst.1035
Fienberg, S. E. (2000). Contingency tables and log-linear models: Basic results and new developments, Journal of the American Statistical Association, 95, 643-647 https://doi.org/10.2307/2669409
Giudici, P. and Passerone, G. (2002). Data mining of association structures to model consumer behaviour, Computational Statistics & Data Analysis, 38, 533-541 https://doi.org/10.1016/S0167-9473(01)00077-9
Kojadinovic, I. (2004). Agglomerative hierarchical clustering of continuous variables based on mutual information, Computational Statistics & Data Analysis, 46, 269-294 https://doi.org/10.1016/S0167-9473(03)00153-1
Kullback, S., Leibler, R. A. (1951). On information and sufficiency, Annals of Mathmatical Statistics, 22, 79-86 https://doi.org/10.1214/aoms/1177729694
Law, G. R., Cox, D. R., Machonochie, N. E. S., E. Roman, J. S., and Carpenter, L. M. (2001). Large Tables, Biostatistics, 2, 163-171 https://doi.org/10.1093/biostatistics/2.2.163
Whittaker, J. (1990). Graphical Models in Applied Multivariate Statistics, John Wiley & Sons

The Korean Journal of Applied Statistics (응용통계연구)

Analysis of Large Tables

대규모 분할표 분석

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)