Mining Quantitative Association Rules using Commercial Data Mining Tools

Kang, Gong-Mi;Moon, Yang-Sae;Choi, Hun-Young;Kim, Jin-Ho;

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

제35권2호
/
Pages.97-111
/
2008
/
1229-7739(pISSN)

한국정보과학회 (Korean Institute of Information Scientists and Engineers)

상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝

Mining Quantitative Association Rules using Commercial Data Mining Tools

강공미 (강원대학교 컴퓨터과학) ;
문양세 (강원대학교 컴퓨터과학) ;
최훈영 (강원대학교 컴퓨터과학) ;
김진호 (강원대학교 컴퓨터과학)

발행 : 2008.04.15

PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

상용 데이타 마이닝 도구에서는 기본적으로 이진 속성에 대한 연관규칙 마이닝만을 지원한다. 그러나, 일반적인 트랜잭션 데이타베이스는 이진 속성 뿐 아니라 정량적 속성을 포함한다. 이에 따라, 본 논문에서는 상용 데이타 마이닝 도구를 사용하여 정량적 연관규칙을 마이닝하는 체계적인 접근법을 제안한다. 이를 위해, 우선 상용 데이타 마이닝 도구를 사용하여 정량적 연관규칙을 찾아내기 위한 전체적인 프레임워크를 제안한다. 제안한 프레임워크는 정량적 속성을 이진 속성으로 변환하는 전처리 과정과 마이닝된 이진 연관규칙을 다시 정량적 연관규칙으로 변환하는 후처리 과정으로 구성된다. 다음으로, 전처리 과정을 위한 구간 분할의 개념을 제시하고, 기존의 평균 및 중앙치 기반 양분할 기법과 동일 너비 및 동일 깊이 기반 다분할 기법을 구간 분할의 개념으로 정형적으로 재정의한다. 그런데, 이들 기존 분할 기법은 속성 값의 분포를 고려하지 않은 문제점이 있다. 본 논문에서는 이를 해결하기 위하여 표준편차 최소화 기법을 제안한다. 표준편차 최소화 기법은 이웃한 속성 값의 표준편차 변화가 작다면 동일한 구간에 포함시키고, 표준편차 변화가 크다면 다른 구간으로 분할하는 매우 직관적인 분할 기법이다. 또한, 후처리 과정으로는 이진 연관규칙들을 통합하고 이를 다시 정량적 연관규칙으로 변환하는 방법을 제안한다. 마지막으로, 다양한 실험을 통하여 제안한 프레임워크가 바르게 동작함을 보이고, 표준편차 최소화 기법이 다른 기법에 비하여 우수함을 입증한다. 이 같은 결과를 볼 때, 제안한 프레임워크는 일반 사용자가 상용 데이타 마아닝 도구를 사용하여 정량적 연간규칙을 쉽게 마이닝 할 수 있는 매우 실용적인 접근법이라 생각한다.

Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.

키워드

참고문헌

Argrawal, R., Imielinski, T. and Swami, A., "Mining Association Rules in Large Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Washington D.C, pp. 207-216, May. 1993
Agrawal, R. and Srikant, R., "Fast Algorithms for Mining Association Rules in Large Databases," In Proc. the 20th Int'l Conf. on Very Large Data Bases, Santiago, Chile, pp. 487-499, Sept. 1994
Park, J.-S., Chen, M.-S. and Philip S. Y., "An Effective Hash-based Algorithm for Mining Association Rules," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, San Jose, California, pp. 175-186, May, 1995
Savasere, A., Omiecinski, E. and Navathe, S., "An Efficient Algorithm for Mining Association Rules in Large Databases," In Proc. the 21st Int'l Conf. on Very Large Databases, Zurich, Switzerland, pp. 432-443, Sept. 1995
Brin, S., Motwani, R., Ullman, J. D. and Tsur, S., "Dynamic Itemset Counting and Implication Rules for Market Basket Data," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 255-264, 1997
Srikant, R. and Agrawal, R., "Mining Genralized Association Rules," In Proc. the 21st Int'l Conf. on Very Large Databases, pp. 407-419, Sept, 1995
Srikant, R., Vu, Q. and Agrawal, R., "Mining Association Rules with Items Constraints," In Proc. the 3rd Int'l Conf. on Knowledge Discovery and Data Mining, pp. 67-73, Aug. 1997
Toivonen, H., "Sampling Large Databases for Association Rules," In Proc. the 22th Int'l Conf. on Very Large Data Bases, Mumbai(Bombay), India, pp. 134-145, Sept. 1996
Park, J.-S., Yu, P.-S. and Chen, M.-S. "Mining Association Rules with Adjustable Accuracy," In Proc. the ACM Sixth Int'l Conf. on Information and Knowledge Management, Las Vagas, Nevada, pp. 151-160, Nov. 1997
Savasere, A., Omiencinski, E. and Navathe, S., "Mining for Strong Negative Associations in a Large Database of Customer Transactions," In Proc. the 14th Int'l Conf. on Data Engineering, Olrando, Florida, pp. 494-502, Feb, 1998
Srikant, R. and Agrawal, R., "Mining Quantitative Association Rules in Large Relational Tables," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Montreal Canada, pp. 1-12, June. 1996
Wang L., David W. C. and Yiu, S. M., "An Efficient Algorithm for Finding Dense Regions for Mining Quantitative Association Rules," Computers & Mathematics with Applications, Vol.50, No.3-4, pp. 471-490, Aug. 2005 https://doi.org/10.1016/j.camwa.2005.03.009
Hu, C., et al., "Mining Quantitative Associations in Large Database," In Proc. the 7th Asia-Pacific Conf. on Web Technologies Research and Development, APWeb2005, Shanghai China, pp. 405-416, Mar. 2005
이혜정, "병렬 처리를 이용한 효과적인 수량 연관규칙에 관한 연구", 순천향대학교 대학원, 전산학과, 박사학위 논문, 2007. 02
Imberman, S. and Domanski, B., "Finding Association Rules From Quantitative Data Using Data Booleanization," In Proc. the 7th Americas Conf. on Information Systems, City University of New York, 2001
IBM. http://www-07.ibm.com/software/kr/data/db2/ product/intelligent_miner_data.html
SAS Enterprise Miner. http://www.sas.com/technologies/ analytics/datamining/miner/
Silicon Graphics MineSet. http://www.sgi.com/
SPSS Clemetine. http://www.spss.com/clementine/
Mendenhall, W. and Beaver, R. J., Introduction to Probability and Statistics, Eighth Edition, Thomson Information, pp. 23-56, 2005
Gibbons, P., Matias, Y. and Poosala, V., "Fast Incremental Maintenance of Approximate Histograms," In Proc. the 23th Int'l Conf. on Very Large Data Bases, Athens, Greece, pp. 466-475, Aug. 1997
Grahne, G. and Zhu, J., "Fast Algorithms for Frequent Itemset Mining Using FP-Trees," IEEE Trans. Knowl. on Data Engineering, Vol.17, No.3, pp. 1347-1362, Oct. 2005 https://doi.org/10.1109/TKDE.2005.166
강현철, 한상태, 최종후, 김차용, 김은성, 김미경, "SAS Enterprise Miner를 이용한 데이타 마이닝(방법론 및 활용)", 자유아카데미, 1999
최종후, 한상태, 강현철, 김차용, 김은성, 김미경, "SAS Enterprise Miner를 이용한 데이타 마이닝(기능과 사용법)", 자유아카데미, 1999

한국정보과학회논문지:데이타베이스 (Journal of KIISE:Databases)

상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝

Mining Quantitative Association Rules using Commercial Data Mining Tools

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)