• Title, Summary, Keyword: hierarchical clustering

Search Result 484, Processing Time 0.042 seconds

Empirical Comparisons of Clustering Algorithms using Silhouette Information

  • Jun, Sung-Hae;Lee, Seung-Joo
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.10 no.1
    • /
    • pp.31-36
    • /
    • 2010
  • Many clustering algorithms have been used in diverse fields. When we need to group given data set into clusters, many clustering algorithms based on similarity or distance measures are considered. Most clustering works have been based on hierarchical and non-hierarchical clustering algorithms. Generally, for the clustering works, researchers have used clustering algorithms case by case from these algorithms. Also they have to determine proper clustering methods subjectively by their prior knowledge. In this paper, to solve the subjective problem of clustering we make empirical comparisons of popular clustering algorithms which are hierarchical and non hierarchical techniques using Silhouette measure. We use silhouette information to evaluate the clustering results such as the number of clusters and cluster variance. We verify our comparison study by experimental results using data sets from UCI machine learning repository. Therefore we are able to use efficient and objective clustering algorithms.

An Agglomerative Hierarchical Variable-Clustering Method Based on a Correlation Matrix

  • Lee, Kwangjin
    • Communications for Statistical Applications and Methods
    • /
    • v.10 no.2
    • /
    • pp.387-397
    • /
    • 2003
  • Generally, most of researches that need a variable-clustering process use an exploratory factor analysis technique or a divisive hierarchical variable-clustering method based on a correlation matrix. And some researchers apply a object-clustering method to a distance matrix transformed from a correlation matrix, though this approach is known to be improper. On this paper an agglomerative hierarchical variable-clustering method based on a correlation matrix itself is suggested. It is derived from a geometric concept by using variate-spaces and a characterizing variate.

Microarray data analysis using relative hierarchical clustering (상대적 계층적 군집 방법을 이용한 마이크로어레이 자료의 군집분석)

  • Woo, Sook Young;Lee, Jae Won;Jhun, Myoungshic
    • Journal of the Korean Data and Information Science Society
    • /
    • v.25 no.5
    • /
    • pp.999-1009
    • /
    • 2014
  • Hierarchical clustering analysis helps easily exploring massive microarray data and understanding biological phenomena with dendrogram. But, because hierarchical clustering algorithms only consider the absolute similarity, it is difficult to illustrate a relative dissimilarity, which consider not only the distance between a pair of clusters, but also how distant are they from the rest of the clusters. In this study, we introduced the relative hierarchical clustering method proposed by Mollineda and Vidal (2000) and compared hierarchical clustering method and relative hierarchical method using the simulated data and the real data in the various situations. The evaluation of the quality of two hierarchical methods was performed using percentage of incorrectly grouped points (PIGP), homogeneity and separation.

A Performance Improvement Study On Hierarchical Clustering (Centroid Linkage) Using A Priority Queue (Priority Queue 를 이용한 Hierarchical Clustering (Centroid Linkage) 성능 개선)

  • Jeon, Yongkweon;Yoon, Sungroh
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • /
    • pp.1837-1838
    • /
    • 2010
  • 기존 hierarchical clustering 은 Time complexity 와 space complexity 가 Large data set 을 clustering 하기에는 적당하지 못하며 이것을 일반 PC 의 메모리 내에서 해결하는데 어려움이 있다. 따라서 본 연구에서는 이러한 어려움을 극복하기 위해 기존 Hierarchical clustering 중 Centroid Linkage 에 새로운 Algorithm 을 제안하여 보다 적은 메모리를 사용하고 빠르게 처리하는 방법을 제안하고자 한다.

  • PDF

Development of Clustering Algorithm and Tool for DNA Microarray Data (DNA 마이크로어레이 데이타의 클러스터링 알고리즘 및 도구 개발)

  • 여상수;김성권
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.10
    • /
    • pp.544-555
    • /
    • 2003
  • Since the result data from DNA microarray experiments contain a lot of gene expression information, adequate analysis methods are required. Hierarchical clustering is widely used for analysis of gene expression profiles. In this paper, we study leaf-ordering, which is a post-processing for the dendrograms output by hierarchical clusterings to improve the efficiency of DNA microarray data analysis. At first, we analyze existing leaf-ordering algorithms and then present new approaches for leaf-ordering. And we introduce a software HCLO(Hierarchical Clustering & Leaf-Ordering Tool) that is our implementation of hierarchical clustering, some of existing leaf-ordering algorithms and those presented in this paper.

Selection of Cluster Topic Words in Hierarchical Clustering using K-Means Algorithm

  • Lee Shin Won;Yi Sang Seon;An Dong Un;Chung Sung Jong
    • Proceedings of the IEEK Conference
    • /
    • /
    • pp.885-889
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. Hierarchical clustering improves the performance of retrieval and makes that users can understand easily. For outperforming of clustering, we implemented hierarchical structure with variety and readability, by careful selection of cluster topic words and deciding the number of clusters dynamically. It is important to select topic words because hierarchical clustering structure is summarizes result of searching. We made choice of noun word as a cluster topic word. The quality of topic words is increased $33\%$ as follows. As the topic word of each cluster, the only noun word is extracted for the top-level cluster and the used topic words for the children clusters were not reused.

  • PDF

Customer Load Pattern Analysis using Clustering Techniques (클러스터링 기법을 이용한 수용가별 전력 데이터 패턴 분석)

  • Ryu, Seunghyoung;Kim, Hongseok;Oh, Doeun;No, Jaekoo
    • KEPCO Journal on Electric Power and Energy
    • /
    • v.2 no.1
    • /
    • pp.61-69
    • /
    • 2016
  • Understanding load patterns and customer classification is a basic step in analyzing the behavior of electricity consumers. To achieve that, there have been many researches about clustering customers' daily load data. Nowadays, the deployment of advanced metering infrastructure (AMI) and big-data technologies make it easier to study customers' load data. In this paper, we study load clustering from the view point of yearly and daily load pattern. We compare four clustering methods; K-means clustering, hierarchical clustering (average & Ward's method) and DBSCAN (Density-Based Spatial Clustering of Applications with Noise). We also discuss the relationship between clustering results and Korean Standard Industrial Classification that is one of possible labels for customers' load data. We find that hierarchical clustering with Ward's method is suitable for clustering load data and KSIC can be well characterized by daily load pattern, but not quite well by yearly load pattern.

Agglomerative Hierarchical Clustering Analysis with Deep Convolutional Autoencoders (합성곱 오토인코더 기반의 응집형 계층적 군집 분석)

  • Park, Nojin;Ko, Hanseok
    • Journal of Korea Multimedia Society
    • /
    • v.23 no.1
    • /
    • pp.1-7
    • /
    • 2020
  • Clustering methods essentially take a two-step approach; extracting feature vectors for dimensionality reduction and then employing clustering algorithm on the extracted feature vectors. However, for clustering images, the traditional clustering methods such as stacked auto-encoder based k-means are not effective since they tend to ignore the local information. In this paper, we propose a method first to effectively reduce data dimensionality using convolutional auto-encoder to capture and reflect the local information and then to accurately cluster similar data samples by using a hierarchical clustering approach. The experimental results confirm that the clustering results are improved by using the proposed model in terms of clustering accuracy and normalized mutual information.

Exploration of Hierarchical Techniques for Clustering Korean Author Names (한글 저자명 군집화를 위한 계층적 기법 비교)

  • Kang, In-Su
    • Journal of Information Management
    • /
    • v.40 no.2
    • /
    • pp.95-115
    • /
    • 2009
  • Author resolution is to disambiguate same-name author occurrences into real individuals. For this, pair-wise author similarities are computed for author name entities, and then clustering is performed. So far, many studies have employed hierarchical clustering techniques for author disambiguation. However, various hierarchical clustering methods have not been sufficiently investigated. This study covers an empirical evaluation and analysis of hierarchical clustering applied to Korean author resolution, using multiple distance functions such as Dice coefficient, Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson correlation coefficient.

A Study on Cluster Hierarchy Depth in Hierarchical Clustering (계층적 클러스터링에서 분류 계층 깊이에 관한 연구)

  • Jin, Hai-Nan;Lee, Shin-won;An, Dong-Un;Chung, Sung-Jong
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • /
    • pp.673-676
    • /
    • 2004
  • Fast and high-quality document clustering algorithms play an important role in providing data exploration by organizing large amounts of information into a small number of meaningful clusters. In particular, hierarchical clustering provide a view of the data at different levels, making the large document collections are adapted to people's instinctive and interested requires. Many papers have shown that the hierarchical clustering method takes good-performance, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity that is linear in the number of documents, but is thought to produce inferior clusters. Think of the factor of simpleness, high-quality and high-efficiency, we combine the two approaches providing a new system named CONDOR system [10] with hierarchical structure based on document clustering using K-means algorithm to "get the best of both worlds". The performance of CONDOR system is compared with the VIVISIMO hierarchical clustering system [9], and performance is analyzed on feature words selection of specific topics and the optimum hierarchy depth.

  • PDF