Standardization for basic association measures in association rule mining

연관 규칙 마이닝에서의 평가기준 표준화 방안

  • Received : 2010.07.19
  • Accepted : 2010.09.06
  • Published : 2010.09.30

Abstract

Association rule is the technique to represent the relationship between two or more items by numerical representing for the relevance of each item in vast amounts of databases, and is most being used in data mining. The basic thresholds for association rule are support, confidence, and lift. these are used to generate the association rules. We need standardization of lift because the range of lift value is different from that of support and confidence. And also we need standardization of support and confidence to compare objectively association level of antecedent variables for one descendant variable. In this paper we propose a method for standardization of association thresholds considering marginal probability for each item to grasp objectively and exactly association level, check the conditions for association criteria and then compare association thresholds with standardized association thresholds using some concrete examples.

연관성 규칙은 방대한 양의 데이터베이스 속에 있는 각 항목들 간의 관련성을 수치화함으로써 두개 이상의 항목간의 관련성을 나타내는 기법으로 데이터 마이닝 분야에서 가장 많이 활용되고 있다. 의미 있는 연관성 규칙을 탐색하기 위한 가장 기본적인 평가기준에는 지지도, 신뢰도, 향상도 등이 있으며, 이들을 이용하여 연관성 규칙을 생성하게 된다. 이 때 사용되는 향상도는 그 값의 범위가 지지도나 신뢰도와는 다르므로 지지도나 신뢰도의 범위를 동일하도록 하기 위해 표준화할 필요가 있으며, 지지도와 신뢰도도 하나의 후항변수에 대해 여러 개의 전항변수들이 있는 경우 이들 중 어느 것이 후항변수와 가장 연관성이 있는지를 객관적으로 비교하기 위해서도 표준화가 필요하다. 본 논문에서는 각 항목집합의 주변 발생확률을 고려하여 객관적이고도 정확한 연관성 정도를 파악하기 위해 연관성 평가기준을 표준화하는 방안에 대해 연구하고자 한다. 또한 흥미도 측도의 세 가지 조건의 충족 여부를 점검해 본 후, 구체적인 예제를 통하여 기존의 연관성 평가기준과 표준화된 연관성 평가기준을 비교 분석하고자 한다.

Keywords

References

  1. Agrawal, R., Imielinski R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Agrawal, R. and Srikant, R. (1994). Fast algorithms for mining association rules. Proceedings of the 20th VLDB Conference, 487-499.
  3. Bayardo, R. J. (1998). Efficiently mining long patterns from databases. Processing of ACM SIGMOD Conference on Management of Data, 85-93.
  4. Cai, C. H., Fu, A. W. C., Cheng, C. H. and Kwong, W. W. (1998). Mining association rules with weighted items. Proceedings of International Database Engineering and Applications Symposium, 68-77.
  5. Cho, K. H. and Park, H. C. (2007). Association rule mining by environmental data fusion. Journal of the Korean Data & Information Science Society, 18, 279-287.
  6. Cho, K. H. and Park, H. C. (2008). A study of association rule application using self-organizing map for fused data. Journal of the Korean Data & Information Science Society, 19, 95-104.
  7. Choi, J. H. and Park, H. C. (2008). Comparative study of quantitative data binning methods in association rule. Journal of the Korean Data & Information Science Society, 19, 903-910.
  8. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  9. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  10. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  11. McNicholas, P. D., Murphy, T. B. and O'Regan, O. (2008). Standardising the lift of an association rule. Computational Statistics and Data Analysis, 52, 4712-4721. https://doi.org/10.1016/j.csda.2008.03.013
  12. Park, H. C. (2008). The proposition of conditionally pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 19, 1141-1151.
  13. Park, H. C. and Cho, K. H. (2005). Waste database analysis joined with local information using association rules. Journal of the Korean Data Analysis Society, 7, 763-772.
  14. Park J. S., Chen M. S. and Philip S. Y. (1995). An effective hash-based algorithms for mining association rules. Proceedings of ACM SIGMOD Conference on Management of Data, 175-186.
  15. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  16. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  17. Piatetsky, S. G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.
  18. Srikant, R. and Agrawal, R. (1995). Mining generalized association rules. Proceedings of the 21st VLDB Conference, 407-419.
  19. Toivonen, H. (1996). Sampling large database for association rules. Proceedings of the 22nd VLDB Conference, 134-145.