DOI QR코드

DOI QR Code

A FCA-based Classification Approach for Analysis of Interval Data

구간데이터분석을 위한 형식개념분석기반의 분류

  • 황석형 (선문대학교 컴퓨터공학과) ;
  • 김응희 (서울대학교 의생명지식공학연구실)
  • Received : 2011.08.26
  • Accepted : 2011.10.27
  • Published : 2012.01.31

Abstract

Based on the internet-based infrastructures such as various information devices, social network systems and cloud computing environments, distributed and sharable data are growing explosively. Recently, as a data analysis and mining technique for extracting, analyzing and classifying the inherent and useful knowledge and information, Formal Concept Analysis on binary or many-valued data has been successfully applied in many diverse fields. However, in formal concept analysis, there has been little research conducted on analyzing interval data whose attributes have some interval values. In this paper, we propose a new approach for classification of interval data based on the formal concept analysis. We present the development of a supporting tool(iFCA) that provides the proposed approach for the binarization of interval data table, concept extraction and construction of concept hierarchies. Finally, with some experiments over real-world data sets, we demonstrate that our approach provides some useful and effective ways for analyzing and mining interval data.

다양한 정보기기와 소셜네트워크시스템, 그리고, 클라우드컴퓨팅환경 등과 같은 인터넷기반의 인프라를 토대로 분산화되고 공유가능한 데이터가 폭발적으로 증가하고 있다. 최근에는 데이터에 내재되어 있는 유용한 정보와 지식을 추출하고 분석 및 분류하기 위한 데이터분석 및 마이닝기법으로서, 이진데이터 또는 다치데이터에 관한 형식개념분석기법에 관한 연구가 활발하게 진행되어 다양한 분야에서 성공적으로 활용되고 있다. 그러나, 각 속성들이 구간값을 갖는 형태로 이루어진 구간데이터의 분석에 대한 형식개념분석에 관한 연구는 많이 수행되지 못하였다. 본 논문에서는, 구간데이터를 분석하기 위하여 형식개념분석기법을 기반으로 하는 새로운 분류기법을 제안한다. 또한, 구간데이터의 이진화, 개념추출 및 개념계층구조 구축 등, 본 논문에서 제안한 새로운 분류기법을 지원하기 위한 도구(iFCA)의 구축에 관하여 소개하고, 마지막으로, 몇가지 실세계의 데이터를 대상으로 한 실험결과를 토대로, 본 논문에서 제안하는 분류기법의 유용성에 대해서 설명한다.

Keywords

References

  1. Pang-Ning Tan, Michael Steinbach, Vipin Kumar, "Introduction to Data Mining," Addison-Wesley, 2005.
  2. Ngai, E.W.T., Hu, Y., Wong, Y.H., Chen, Y. & Sun, X. "The application of data mining techniques in financial fraud detection: A classification framework and an academic review of literature." Decision Support Systems 50, pp.559-569, 2010.
  3. V. S. Verykios and E. Bertino and I. N. Fovino and L. P. Provenza and Y. Saygin and Y. Theodoridis, State-of-the-art in privacy preserving data mining, ACM SIGMOD Record, Vol. 1, No. 33, 2004.
  4. Clifton Phua and Vincent C. S. Lee and Kate Smit h-Miles and Ross W. Gayler, "A Comprehensive Survey of Data Mining-based Fraud Detection Research," Artificial Intelligence Review, May 2005.
  5. Thabtah, Fadi Abdeljaber, "A review of associative classification mining," Knowledge Engineering Review, Vol.22, No.1. pp.37-65, 2007. https://doi.org/10.1017/S0269888907001026
  6. Ruotsalainen, Laura, Data Mining Tools for Techn ology and Competitive Intelligence, ESPOO 2008, VTT Tiedotteita n Research Notes 2451, 2008.
  7. Ganter, B., Wille, R., "Formal Concept Analysis: Mathematical foundations." Springer, 1999.
  8. Gerd Stumme, "Hierarchies of Conceptual Scales," Proceedings of Workshop on Knowledge Acquisition, Modeling and Management (KAW'99), 1999.
  9. J. Poelmans, P. Elzinga, S. Viaene, G. Dedene, "Formal Concept Analysis in Knowledge Discovery: A Survey," ICCS2010, pp.139-153, 2010.
  10. R. Cole, P. Eklund and D. Walker, "Using Conc eptual Scaling In Formal Concept Analysis For Knowledge And Data Discovery In Medical Texts," International Symposium on Knowledge Retrieval, Use, and Storage for Efficiency, pp.151-164, 1997.
  11. Susanne Prediger, "Logical Scaling in Formal Concept Analysis," LNCS 1257, 1997.
  12. Joachim H. Correia, "Relational Scaling and Databases," Proceedings of the 10th International Conference on Conceptual Structures, LNCS2393, 2002.
  13. Lyamine Hedjazi and Joseph Aguilar-Martin and Marie-Veronique Le Lann, "Similarity-margin based feature selection for symbolic interval data," Pattern Recognition Letters, Vol.32, No.4, 2011.
  14. De Carvalho, F.A.T., De Souza, R.M.C.R., Chavent, M., Lechevallier, Y., "Adaptive Hausdorff distances and dynamic clustering of symbolic interval data," Pattern Recognition 27, pp.167-179, 2006. https://doi.org/10.1016/j.patrec.2005.08.014
  15. Quevedo, J., Puig, V., Cembrano, G., Blanch, J., Aguilar, J., Saporta, D., Benito, G., Hedo, M., Molina, A., "Validation and reconstruction of flow meter data in the Barcelona water distribution network." Journal of Control Eng. Practice 18, pp.640-651, 2010. https://doi.org/10.1016/j.conengprac.2010.03.003