DOI QR코드

DOI QR Code

Comparison of confidence measures useful for classification model building

분류 모형 구축에 유용한 신뢰도 측도 간의 비교

  • Received : 2014.02.04
  • Accepted : 2014.03.03
  • Published : 2014.03.31

Abstract

Association rule of the well-studied techniques in data mining is the exploratory data analysis for understanding the relevance among the items in a huge database. This method has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, similarity measures, etc. By typical association rule technique, we generate association rule that satisfy minimum support and confidence values. Support and confidence are the most frequently used, but they have the drawback that they can not determine the direction of the association because they have always positive values. In this paper, we compared support, basic confidence, and three kinds of confidence measures useful for classification model building to overcome this problem. The result confirmed that the causal confirmed confidence was the best confidence in view of the association mining because it showed more precisely the direction of association.

데이터 마이닝 기법 중에서 연관성 규칙은 하나의 거래나 사건에 포함되어 있는 항목들의 관련성을 파악하기 위한 탐색적 자료 분석 방법이다. 이 기법은 지지도, 신뢰도, 향상도 등과 같은 흥미도 측도들을 이용하여 연관성 규칙을 생성한다. 일반적인 연관성 규칙에서는 최소 지지도를 만족하는 빈발항목집합을 생성한 후 최저 신뢰도를 만족하는 것을 연관성 규칙으로 채택하게 된다. 이 때 규칙 여부를 결정하기 위해 가장 많이 사용되는 신뢰도는 고려하는 항목의 순서가 바뀌게 되면 그 값이 달라지는 비대칭적 측도가 되는 동시에 항상 양의 값을 가진다. 따라서 신뢰도 값의 크기로는 양의 연관성이 있는지, 아니면 음의 연관성이 있는지를 알 수 없다. 본 논문에서는 이러한 문제를 극복하기 위해 분류 모형 구축에 유용한 신뢰도 측도들을 소개하고, 신뢰도들 간의 비교 분석을 통해 유용성을 평가하였다. 그 결과, 인과적 확인 신뢰도가 연관성의 방향을 보다 정확하게 나타내고 있다는 사실을 확인 하였다.

Keywords

References

  1. Agrawal, R., Imielinski, R. and Swami, A. (1993). Mining association rules between sets of items in large databases. Proceedings of the ACM SIGMOD Conference on Management of Data, 207-216.
  2. Berzal, F., Cubero, J., Marin, N., Sanchez, D., Serrano, J. and Vila, A. (2005). Association rule evaluation for classification purposes. Actas del III Taller Nacional de Mineria de Datos y Aprendizaje, TAMIDA2005, 135-144.
  3. Cho, K. H. and Park, H. C. (2011a). Study on the multi intervening relation in association rules. Journal of the Korean Data Analysis Society, 13, 297-306.
  4. Cho, K. H. and Park, H. C. (2011b). A study on insignificant rules discovery in association rule mining. Journal of the Korean Data & Information Science Society, 22, 81-88.
  5. Han, J. and Fu, Y. (1999). Mining multiple-level association rules in large databases. IEEE Transactions on Knowledge and Data Engineering, 11, 68-77.
  6. Han, J., Pei, J. and Yin, Y. (2000). Mining frequent patterns without candidate generation. Proceedings of ACM SIGMOD Conference on Management of Data, 1-12.
  7. Jin, D. S., Kang, C., Kim, K. K. and Choi, S. B. (2011). CRM on travel agency using association rules. Journal of the Korean Data Analysis Society, 13, 2945-2952.
  8. Kodratoff, Y. (2000). Comparing machine learning and knowledge discovery in databases: An application to knowledge discovery in texts. Proceeding of Machine Learning and its Applications: Advanced Lectures, 1-21.
  9. Liu, B., Hsu, W. and Ma, Y. (1999). Mining association rules with multiple minimum supports. Proceedings of the 5th International Conference on Knowledge Discovery and Data Mining, 337-241.
  10. Park, H. C. (2011a). Association rule ranking function by decreased lift influence. Journal of the Korean Data & Information Science Society, 22, 179-188.
  11. Park, H. C. (2011b). The proposition of attributably pure confidence in association rule mining. Journal of the Korean Data & Information Science Society, 22, 235-243.
  12. Park, H. C. (2012a). Negatively attributable and pure confidence for generation of negative association rules. Journal of the Korean Data & Information Science Society, 23, 707-716.
  13. Park, H. C. (2012b). Exploration of PIM based similarity measures as association rule thresholds. Journal of the Korean Data & Information Science Society, 23, 1127-1135. https://doi.org/10.7465/jkdi.2012.23.6.1127
  14. Park, H. C. (2013). Proposition of causal association rule thresholds. Journal of the Korean Data & Information Science Society, 24, 1189-1197. https://doi.org/10.7465/jkdi.2013.24.6.1189
  15. Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999). Discovering frequent closed itemsets for association rules. Proceedings of the 7th International Conference on Database Theory, 398-416.
  16. Pei, J., Han, J. and Mao, R. (2000). CLOSET: An efficient algorithm for mining frequent closed itemsets. Proceedings of ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 21-30.
  17. Saygin, Y., Vassilios, S. V. and Clifton, C. (2002). Using unknowns to prevent discovery of association rules. Proceedings of 2002 Conference on Research Issues in Data Engineering, 45-54.

Cited by

  1. Analysis of error source in subjective evaluation results on Taekwondo Poomsae: Application of generalizability theory vol.27, pp.2, 2016, https://doi.org/10.7465/jkdi.2016.27.2.395
  2. Construction of the structural equation model on college adaptation in nursing students vol.26, pp.6, 2015, https://doi.org/10.7465/jkdi.2015.26.6.1439
  3. Proposition of balanced comparative confidence considering all available diagnostic tools vol.26, pp.3, 2015, https://doi.org/10.7465/jkdi.2015.26.3.611