Mining Quantitative Association Rules using Commercial Data Mining Tools

상용 데이타 마이닝 도구를 사용한 정량적 연관규칙 마이닝

  • 강공미 (강원대학교 컴퓨터과학) ;
  • 문양세 (강원대학교 컴퓨터과학) ;
  • 최훈영 (강원대학교 컴퓨터과학) ;
  • 김진호 (강원대학교 컴퓨터과학)
  • Published : 2008.04.15

Abstract

Commercial data mining tools basically support binary attributes only in mining association rules, that is, they can mine binary association rules only. In general, however. transaction databases contain not only binary attributes but also quantitative attributes. Thus, in this paper we propose a systematic approach to mine quantitative association rules---association rules which contain quantitative attributes---using commercial mining tools. To achieve this goal, we first propose an overall working framework that mines quantitative association rules based on commercial mining tools. The proposed framework consists of two steps: 1) a pre-processing step which converts quantitative attributes into binary attributes and 2) a post-processing step which reconverts binary association rules into quantitative association rules. As the pre-processing step, we present the concept of domain partition, and based on the domain partition, we formally redefine the previous bipartition and multi-partition techniques, which are mean-based or median-based techniques for bipartition, and are equi-width or equi-depth techniques for multi-partition. These previous partition techniques, however, have the problem of not considering distribution characteristics of attribute values. To solve this problem, in this paper we propose an intuitive partition technique, named standard deviation minimization. In our standard deviation minimization, adjacent attributes are included in the same partition if the change of their standard deviations is small, but they are divided into different partitions if the change is large. We also propose the post-processing step that integrates binary association rules and reconverts them into the corresponding quantitative rules. Through extensive experiments, we argue that our framework works correctly, and we show that our standard deviation minimization is superior to other partition techniques. According to these results, we believe that our framework is practically applicable for naive users to mine quantitative association rules using commercial data mining tools.

References

  1. Argrawal, R., Imielinski, T. and Swami, A., "Mining Association Rules in Large Databases," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Washington D.C, pp. 207-216, May. 1993
  2. Agrawal, R. and Srikant, R., "Fast Algorithms for Mining Association Rules in Large Databases," In Proc. the 20th Int'l Conf. on Very Large Data Bases, Santiago, Chile, pp. 487-499, Sept. 1994
  3. Park, J.-S., Chen, M.-S. and Philip S. Y., "An Effective Hash-based Algorithm for Mining Association Rules," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, San Jose, California, pp. 175-186, May, 1995
  4. Savasere, A., Omiecinski, E. and Navathe, S., "An Efficient Algorithm for Mining Association Rules in Large Databases," In Proc. the 21st Int'l Conf. on Very Large Databases, Zurich, Switzerland, pp. 432-443, Sept. 1995
  5. Brin, S., Motwani, R., Ullman, J. D. and Tsur, S., "Dynamic Itemset Counting and Implication Rules for Market Basket Data," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Tucson, Arizona, pp. 255-264, 1997
  6. Srikant, R. and Agrawal, R., "Mining Genralized Association Rules," In Proc. the 21st Int'l Conf. on Very Large Databases, pp. 407-419, Sept, 1995
  7. Srikant, R., Vu, Q. and Agrawal, R., "Mining Association Rules with Items Constraints," In Proc. the 3rd Int'l Conf. on Knowledge Discovery and Data Mining, pp. 67-73, Aug. 1997
  8. Toivonen, H., "Sampling Large Databases for Association Rules," In Proc. the 22th Int'l Conf. on Very Large Data Bases, Mumbai(Bombay), India, pp. 134-145, Sept. 1996
  9. Park, J.-S., Yu, P.-S. and Chen, M.-S. "Mining Association Rules with Adjustable Accuracy," In Proc. the ACM Sixth Int'l Conf. on Information and Knowledge Management, Las Vagas, Nevada, pp. 151-160, Nov. 1997
  10. Savasere, A., Omiencinski, E. and Navathe, S., "Mining for Strong Negative Associations in a Large Database of Customer Transactions," In Proc. the 14th Int'l Conf. on Data Engineering, Olrando, Florida, pp. 494-502, Feb, 1998
  11. Srikant, R. and Agrawal, R., "Mining Quantitative Association Rules in Large Relational Tables," In Proc. Int'l Conf. on Management of Data, ACM SIGMOD, Montreal Canada, pp. 1-12, June. 1996
  12. Wang L., David W. C. and Yiu, S. M., "An Efficient Algorithm for Finding Dense Regions for Mining Quantitative Association Rules," Computers & Mathematics with Applications, Vol.50, No.3-4, pp. 471-490, Aug. 2005 https://doi.org/10.1016/j.camwa.2005.03.009
  13. Hu, C., et al., "Mining Quantitative Associations in Large Database," In Proc. the 7th Asia-Pacific Conf. on Web Technologies Research and Development, APWeb2005, Shanghai China, pp. 405-416, Mar. 2005
  14. 이혜정, "병렬 처리를 이용한 효과적인 수량 연관규칙에 관한 연구", 순천향대학교 대학원, 전산학과, 박사학위 논문, 2007. 02
  15. Imberman, S. and Domanski, B., "Finding Association Rules From Quantitative Data Using Data Booleanization," In Proc. the 7th Americas Conf. on Information Systems, City University of New York, 2001
  16. IBM. http://www-07.ibm.com/software/kr/data/db2/ product/intelligent_miner_data.html
  17. SAS Enterprise Miner. http://www.sas.com/technologies/ analytics/datamining/miner/
  18. Silicon Graphics MineSet. http://www.sgi.com/
  19. SPSS Clemetine. http://www.spss.com/clementine/
  20. Mendenhall, W. and Beaver, R. J., Introduction to Probability and Statistics, Eighth Edition, Thomson Information, pp. 23-56, 2005
  21. Gibbons, P., Matias, Y. and Poosala, V., "Fast Incremental Maintenance of Approximate Histograms," In Proc. the 23th Int'l Conf. on Very Large Data Bases, Athens, Greece, pp. 466-475, Aug. 1997
  22. Grahne, G. and Zhu, J., "Fast Algorithms for Frequent Itemset Mining Using FP-Trees," IEEE Trans. Knowl. on Data Engineering, Vol.17, No.3, pp. 1347-1362, Oct. 2005 https://doi.org/10.1109/TKDE.2005.166
  23. 강현철, 한상태, 최종후, 김차용, 김은성, 김미경, "SAS Enterprise Miner를 이용한 데이타 마이닝(방법론 및 활용)", 자유아카데미, 1999
  24. 최종후, 한상태, 강현철, 김차용, 김은성, 김미경, "SAS Enterprise Miner를 이용한 데이타 마이닝(기능과 사용법)", 자유아카데미, 1999