DOI QR코드

DOI QR Code

An Empirical Study of Qualities of Association Rules from a Statistical View Point

  • Dorn, Maryann (Dept. of Computer Science, Southern Illinois University) ;
  • Hou, Wen-Chi (Dept. of Computer Science, Southern Illinois University) ;
  • Che, Dunren (Dept. of Computer Science, Southern Illinois University) ;
  • Jiang, Zhewei (Dept. of Computer Science, Southern Illinois University)
  • Published : 2008.03.31

Abstract

Minimum support and confidence have been used as criteria for generating association rules in all association rule mining algorithms. These criteria have their natural appeals, such as simplicity; few researchers have suspected the quality of generated rules. In this paper, we examine the rules from a more rigorous point of view by conducting statistical tests. Specifically, we use contingency tables and chi-square test to analyze the data. Experimental results show that one third of the association rules derived based on the support and confidence criteria are not significant, that is, the antecedent and consequent of the rules are not correlated. It indicates that minimum support and minimum confidence do not provide adequate discovery of meaningful associations. The chi-square test can be considered as an enhancement or an alternative solution.

References

  1. Agrawal, R., Imielinski, T., and Swami, A. “Mining Association Rules Between Sets of Items in Large Databases,” In Proc. of the ACM-SIGMOD Conf. on Management of Data, Washington, D. C., 1993, pp. 207-216
  2. Agrawal R., Srikant, R. “Fast algorithms for Mining Association Rules,” In Proc. of the $20^{th}$ VLDB Conference, Santiago, Chile, 1994, pp. 487-499
  3. Agrawal, R. and Srikant, R. “Fast Algorithms for Mining Association Rules,” IBM Research Report RJ9839, June 1994. IBM Almaden Research Center, San Jose, CA
  4. Bayardo, R. J. and Agrawal, R. “Mining the Most Interesting Rules,” In Proc. of the Fifth ACM SIGKDD Conf. on Knowledge Discovery and Data Mining, 1999, pp.145-154
  5. Bayardo, R., Agrawal, R, and Gunopulos, D. "Constraint-Based Rule Mining in Large, Dense Databases,” In Proc. of the 15th Int'l Conf. on Data Engineering, 188-197, 1999
  6. Brin, S. Motwani, R. and Silverstein, R. “Beyond Market Basket: Generalizing Association Rules to Correlations.” SIGMOD-97, 1997, 265-276
  7. Brin, S., Motwani, R., Ullman, J., and Tsur, S. “Dynamic Itemset Counting and Implication Rules for Market Basket Data.” In Proc. of the 1997 ACM-SIGMOD Int'l Conf. on the Management of Data, 1997, 255-264
  8. Ganti, V., Gebrke, and Ramakrishnan, R. "Mining Very Large Databases," Computer, Vol. 32, No. 8, Aug. 1999, pp. 38-45 https://doi.org/10.1109/2.781633
  9. Glass, G. V. and Hopkins, K. D. Statistical Methods in Education and Psychology. (2nd ed. ) Prentice Hall, New Jersey, 1984
  10. Gokhale, D. V. and Kullback, S. The Information in Contingency Tables. Marcel Dekker Inc., New York, 1978
  11. Han, J. and Fu, Y. “Discovery of multiple-level association rules from large databases.” VLDB-95
  12. Liu B., Hsu W., and Ma Y. "Pruning and Summarizing the Discovered Associations, " in Proc. of the ACM SIGKDD Int'l Conference on Knowledge Discovery & Data Mining, San Diego, CA, 1999
  13. Liu B., Hsu W., and Ma Y. "Mining Association Rules with Multiple Minimum Supports" in Proc. of the ACM SIGKDD Int'l Conference on Knowledge Discovery & Data Mining, 1999
  14. Liu B., Hsu W., Wang K., and Chen S. "Mining Interesting Knowledge Using DM-II" in Proc. of the ACM SIGKDD Int'l Conference on Knowledge Discovery & Data Mining, 1999
  15. Mason, R. D., Lind, D. A., and Marchal, W. G. STATISTICS: An Introduction, 5th ed. Duxbury Press, 1998
  16. Park, J. S.; Chen, M.-S.; and Yu, P. S. An Effective Hash Based Algorithm for Mining Association Rules. In Proc. of SIGMOD Conf. on the Management of Data, 1995, pp 175-186
  17. Srikant, R. and Agrawal, R. “Mining Generalized Association Rules,” In Proc. of the 21st Int'l Conf. on VLDB, 1995, pp. 407-419
  18. Srikant, R. and Agrawal, R. Mining Generalized Association Rules. IBM Research Report RJ9963, June 1995. IBM Almaden Research Center, San Jose, CA
  19. Srikant, R., Vu, Q., and Agrawal, R. “Mining Association Rules with Item Constraints,” In Proc. of the Third Int'l Conf. on Knowledge Discovery in Databases and Data Mining, 1997, pp. 67-73
  20. Toivonen H. “Sampling Large Databases for Association Rules,” In Proc. of the 22th VLDB Conference, Mumbai, India, 1996, pp. 134-144
  21. Zaki, M. J.; Parthasarathy, S.; Ogihara, M.; and Li, W. New Algorithms for Fast Discovery of Association Rules. In Proc. of the Third Int'l Conf. on Knowledge Discovery in Databases and Data Mining, 1997, pp. 283-286