A Study for Determining the Best Number of Clusters on Temporal Data

Temporal 데이터의 최적의 클러스터 수 결정에 관한 연구

  • 조영희 (단국대학교 전자계산학과) ;
  • 이계성 (단국대학교 전자계산학과) ;
  • 전진호 (단국대학교 전자계산학과)
  • Published : 2006.01.01

Abstract

A clustering method for temporal data takes a model-based approach. This uses automata based model for each cluster. It is necessary to construct global models for a set of data in order to elicit individual models for the cluster. The preparation for building individual models is completed by determining the number of clusters inherent in the data set. In this paper, BIC(Bayesian Information Criterion) approximation is used to determine the number clusters and confirmed its applicability. A search technique to improve efficiency is also suggested by analyzing the relationship between data size and BIC values. A number of experiments have been performed to check its validity using artificially generated data sets. BIC approximation measure has been confirmed that it suggests best number of clusters through experiments provided that the number of data is relatively large.

Keywords

Temporal Data;Clustering;BIC;Number of Clusters;Model-based