DOI QR코드

DOI QR Code

Development of association rule threshold by balancing of relative rule accuracy

상대적 규칙 정확도의 균형화에 의한 연관성 측도의 개발

  • Received : 2014.09.11
  • Accepted : 2014.10.13
  • Published : 2014.11.30

Abstract

Data mining is the representative methodology to obtain meaningful information in the era of big data.By Wikipedia, association rule learning is a popular and well researched method for discovering interesting relationship between itemsets in large databases using association thresholds. It is intended to identify strong rules discovered in databases using different interestingness measures. Unlike general association rule, inverse association rule mining finds the rules that a special item does not occur if an item does not occur. If two types of association rule can be simultaneously considered, we can obtain the marketing information for some related products as well as the information of specific product marketing. In this paper, we propose a balanced attributable relative accuracy applicable to these association rule techniques, and then check the three conditions of interestingness measures by Piatetsky-Shapiro (1991). The comparative studies with rule accuracy, relative accuracy, attributable relative accuracy, and balanced attributable relative accuracy are shown by numerical example. The results show that balanced attributable relative accuracy is better than any other accuracy measures.

데이터마이닝 기법 중에서 연관성 규칙은 연관성 평가 기준을 기반으로 하여 데이터베이스에 포함되어 있는 항목들 간의 관련성을 탐색하는 기법이다. 일반적인 연관성 규칙 기법과는 달리 역의 연관성 규칙은 하나의 항목집합이 발생하지 않으면 다른 항목집합도 발생하지 않는다는 규칙을 찾아내는 것이다. 이러한 역의 연관성 규칙을 일반적인 연관성 규칙과 함께 생성하면 기업체에서 특정 제품을 판매하기 위해서는 그 제품만의 마케팅뿐만 아니라 더 나아가 어떤 제품의 마케팅이 필요한 지에 대한 정보를 파악할 수 있다. 이를 위해 본 논문에서는 이러한 두 종류의 연관성 규칙에 적용 가능한 균형화된 기여 상대적 규칙 정확도를 연관성 평가 기준으로 제안하고자 한다. 또한 Piatetsky-Shapiro (1991)가 제안한 흥미도 측도가 가져야 할 조건들을 점검한 후, 예제를 통하여 제안된 측도와 연관성 규칙에 적용 가능한 의학진단분야의 평가 측도들의 유용성을 비교하였다. 그 결과, 기여 상대적 정확도와 역의 기여 상대적 정확도의 크기가 다르게 나타나면 연관성의 정도를 명확하게 설명하기가 어려우므로 이들 두 측도를 동시에 고려한 균형화된 기여 상대적 규칙 정확도를 이용하는 것이 가장 바람직하다는 사실을 확인하였다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  3. Cho, K. H. and Park, H. C. (2011b). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data & Information Science Society, 22, 81-88.
  4. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  5. Hwang, J. and Kim, J. (2003). Target marketing using inverse association rule. Journal of Intelligence and Information Systems, 9, 195-209.
  6. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using associat ion rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  7. Lavrac, N., Flach, P. and Zupan, B. (1999). Rule evaluation measures: A unifying view. Proceedings of the 9th International Workshop on Inductive Logic Programming, 174-185.
  8. McNicholas, P. D., Murphy, T. B. and O'Regan, O. (2008). Standardising the lift of an association rule. Computational Statistics and Data Analysis, 52, 4712-4721. https://doi.org/10.1016/j.csda.2008.03.013
  9. Park, H. C. (2010). Proposition of inverse pure association rule. Journal of the Korean Data Analysis Society, 12, 3305-3315.
  10. Park, H. C. (2011). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data and Information Science Society, 22, 235-243.
  11. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 23, 707-716.
  12. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  13. Park, H. C. (2013). Proposition of causal association rule thresholds. Journal of the Korean Data & Information Science Society, 24, 1189-1197. https://doi.org/10.7465/jkdi.2013.24.6.1189
  14. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  15. Piatetsky-Shapiro, G. (1991). Discovery, analysis and presentation of strong rules. Knowledge Discovery in Databases, AAAI/MIT Press, 229-248.

Cited by

  1. Proposition of balanced comparative confidence considering all available diagnostic tools vol.26, pp.3, 2015, https://doi.org/10.7465/jkdi.2015.26.3.611