Search | Korea Science

Approximate k values using Repulsive Force without Domain Knowledge in k-means

Kim, Jung-Jae;Ryu, Minwoo;Cha, Si-Ho
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.14 no.3
- /
- pp.976-990
- /
- 2020
The k-means algorithm is widely used in academia and industry due to easy and simple implementation, enabling fast learning for complex datasets. However, k-means struggles to classify datasets without prior knowledge of specific domains. We proposed the repulsive k-means (RK-means) algorithm in a previous study to improve the k-means algorithm, using the repulsive force concept, which allows deleting unnecessary cluster centroids. Accordingly, the RK-means enables to classifying of a dataset without domain knowledge. However, three main problems remain. The RK-means algorithm includes a cluster repulsive force offset, for clusters confined in other clusters, which can cause cluster locking; we were unable to prove RK-means provided optimal convergence in the previous study; and RK-means shown better performance only normalize term and weight. Therefore, this paper proposes the advanced RK-means (ARK-means) algorithm to resolve the RK-means problems. We establish an initialization strategy for deploying cluster centroids and define a metric for the ARK-means algorithm. Finally, we redefine the mass and normalize terms to close to the general dataset. We show ARK-means feasibility experimentally using blob and iris datasets. Experiment results verify the proposed ARK-means algorithm provides better performance than k-means, k'-means, and RK-means.
https://doi.org/10.3837/tiis.2020.03.004 인용 PDF KSCI HTML

An Implementation of K-Means Algorithm Improving Cluster Centroids Decision Methodologies (클러스터 중심 결정 방법을 개선한 K-Means 알고리즘의 구현)

Lee Shin-Won;Oh HyungJin;An Dong-Un;Jeong Seong-Jong
- The KIPS Transactions:PartB
- /
- v.11B no.7 s.96
- /
- pp.867-874
- /
- 2004
K-Means algorithm is a non-hierarchical (plat) and reassignment techniques and iterates algorithm steps on the basis of K cluster centroids until the clustering results converge into K clusters. In its nature, K-Means algorithm has characteristics which make different results depending on the initial and new centroids. In this paper, we propose the modified K-Means algorithm which improves the initial and new centroids decision methodologies. By evaluating the performance of two algorithms using the 16 weighting scheme of SMART system, the modified algorithm showed $20{\%}$ better results on recall and F-measure than those of K-Means algorithm, and the document clustering results are quite improved.
https://doi.org/10.3745/KIPSTB.2004.11B.7.867 인용 PDF KSCI

A Fast K-means and Fuzzy-c-means Algorithms using Adaptively Initialization (적응적인 초기치 설정을 이용한 Fast K-means 및 Frizzy-c-means 알고리즘)

강지혜;김성수
- Journal of KIISE:Software and Applications
- /
- v.31 no.4
- /
- pp.516-524
- /
- 2004
In this paper, the initial value problem in clustering using K-means or Fuzzy-c-means is considered to reduce the number of iterations. Conventionally the initial values in clustering using K-means or Fuzzy-c-means are chosen randomly, which sometimes brings the results that the process of clustering converges to undesired center points. The choice of intial value has been one of the well-known subjects to be solved. The system of clustering using K-means or Fuzzy-c-means is sensitive to the choice of intial values. As an approach to the problem, the uniform partitioning method is employed to extract the optimal initial point for each clustering of data. Experimental results are presented to demonstrate the superiority of the proposed method, which reduces the number of iterations for the central points of clustering groups.
PDF KSCI

Environmental Survey Data Modeling Using K-means Clustering Techniques

Park, Hee-Chang;Cho, Kwang-Hyun
- Journal of the Korean Data and Information Science Society
- /
- v.16 no.3
- /
- pp.557-566
- /
- 2005
Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering Is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.
PDF

Environmental Survey Data Modeling using K-means Clustering Techniques

Park, Hee-Chang;Cho, Kwang-Hyun
- 한국데이터정보과학회:학술대회논문집
- /
- 2004.10a
- /
- pp.77-86
- /
- 2004
Clustering is the process of grouping the data into clusters so that objects within a cluster have high similarity in comparison to one another. In this paper we used k-means clustering of several clustering techniques. The k-means Clustering is classified as a partitional clustering method. We analyze 2002 Gyeongnam social indicator survey data using k-means clustering techniques for environmental information. We can use these outputs given by k-means clustering for environmental preservation and environmental improvement.
PDF

An Efficient Clustering Method based on Multi Centroid Set using MapReduce (맵리듀스를 이용한 다중 중심점 집합 기반의 효율적인 클러스터링 방법)

Kang, Sungmin;Lee, Seokjoo;Min, Jun-ki
- KIISE Transactions on Computing Practices
- /
- v.21 no.7
- /
- pp.494-499
- /
- 2015
As the size of data increases, it becomes important to identify properties by analyzing big data. In this paper, we propose a k-Means based efficient clustering technique, called MCSKMeans (Multi centroid set k-Means), using distributed parallel processing framework MapReduce. A problem with the k-Means algorithm is that the accuracy of clustering depends on initial centroids created randomly. To alleviate this problem, the MCSK-Means algorithm reduces the dependency of initial centroids using sets consisting of k centroids. In addition, we apply the agglomerative hierarchical clustering technique for creating k centroids from centroids in m centroid sets which are the results of the clustering phase. In this paper, we implemented our MCSK-Means based on the MapReduce framework for processing big data efficiently.
https://doi.org/10.5626/KTCP.2015.21.7.494 인용 KSCI

Extensions of X-means with Efficient Learning the Number of Clusters (X-means 확장을 통한 효율적인 집단 개수의 결정)

Heo, Gyeong-Yong;Woo, Young-Woon
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.12 no.4
- /
- pp.772-780
- /
- 2008
K-means is one of the simplest unsupervised learning algorithms that solve the clustering problem. However K-means suffers the basic shortcoming: the number of clusters k has to be known in advance. In this paper, we propose extensions of X-means, which can estimate the number of clusters using Bayesian information criterion(BIC). We introduce two different versions of algorithm: modified X-means(MX-means) and generalized X-means(GX-means), which employ one full covariance matrix for one cluster and so can estimate the number of clusters efficiently without severe over-fitting which X-means suffers due to its spherical cluster assumption. The algorithms start with one cluster and try to split a cluster iteratively to maximize the BIC score. The former uses K-means algorithm to find a set of optimal clusters with current k, which makes it simple and fast. However it generates wrongly estimated centers when the clusters are overlapped. The latter uses EM algorithm to estimate the parameters and generates more stable clusters even when the clusters are overlapped. Experiments with synthetic data show that the purposed methods can provide a robust estimate of the number of clusters and cluster parameters compared to other existing top-down algorithms.
https://doi.org/10.6109/jkiice.2008.12.4.772 인용 PDF KSCI

K-means Clustering using Grid-based Representatives

Park, Hee-Chang;Lee, Sun-Myung
- Journal of the Korean Data and Information Science Society
- /
- v.16 no.4
- /
- pp.759-768
- /
- 2005
K-means clustering has been widely used in many applications, such that pattern analysis, data analysis, market research and so on. It can identify dense and sparse regions among data attributes or object attributes. But k-means algorithm requires many hours to get k clusters, because it is more primitive and explorative. In this paper we propose a new method of k-means clustering using the grid-based representative value(arithmetic and trimmed mean) for sample. It is more fast than any traditional clustering method and maintains its accuracy.
PDF

Fuzzy k-Means Local Centers of the Social Networks

Woo, Won-Seok;Huh, Myung-Hoe
- Communications for Statistical Applications and Methods
- /
- v.19 no.2
- /
- pp.213-217
- /
- 2012
Fuzzy k-means clustering is an attractive alternative to the ordinary k-means clustering in analyzing multivariate data. Fuzzy versions yield more natural output by allowing overlapped k groups. In this study, we modify a fuzzy k-means clustering algorithm to be used for undirected social networks, apply the algorithm to both real and simulated cases, and report the results.
https://doi.org/10.5351/CKSS.2012.19.2.213 인용 PDF KSCI

A Variable Selection Procedure for K-Means Clustering

Kim, Sung-Soo
- The Korean Journal of Applied Statistics
- /
- v.25 no.3
- /
- pp.471-483
- /
- 2012
One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".
https://doi.org/10.5351/KJAS.2012.25.3.471 인용 PDF KSCI

Search Result 17,890, Processing Time 0.043 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)