Search | Korea Science

An Agglomerative Hierarchical Variable-Clustering Method Based on a Correlation Matrix

Lee, Kwangjin
- Communications for Statistical Applications and Methods
- /
- v.10 no.2
- /
- pp.387-397
- /
- 2003
Generally, most of researches that need a variable-clustering process use an exploratory factor analysis technique or a divisive hierarchical variable-clustering method based on a correlation matrix. And some researchers apply a object-clustering method to a distance matrix transformed from a correlation matrix, though this approach is known to be improper. On this paper an agglomerative hierarchical variable-clustering method based on a correlation matrix itself is suggested. It is derived from a geometric concept by using variate-spaces and a characterizing variate.
https://doi.org/10.5351/CKSS.2003.10.2.387 인용 PDF KSCI

Variable Selection and Outlier Detection for Automated K-means Clustering

Kim, Sung-Soo
- Communications for Statistical Applications and Methods
- /
- v.22 no.1
- /
- pp.55-67
- /
- 2015
An important problem in cluster analysis is the selection of variables that define cluster structure that also eliminate noisy variables that mask cluster structure; in addition, outlier detection is a fundamental task for cluster analysis. Here we provide an automated K-means clustering process combined with variable selection and outlier identification. The Automated K-means clustering procedure consists of three processes: (i) automatically calculating the cluster number and initial cluster center whenever a new variable is added, (ii) identifying outliers for each cluster depending on used variables, (iii) selecting variables defining cluster structure in a forward manner. To select variables, we applied VS-KM (variable-selection heuristic for K-means clustering) procedure (Brusco and Cradit, 2001). To identify outliers, we used a hybrid approach combining a clustering based approach and distance based approach. Simulation results indicate that the proposed automated K-means clustering procedure is effective to select variables and identify outliers. The implemented R program can be obtained at http://www.knou.ac.kr/~sskim/SVOKmeans.r.
https://doi.org/10.5351/CSAM.2015.22.1.055 인용 PDF KSCI

On the Categorical Variable Clustering

Kim, Dae-Hak
- Journal of the Korean Data and Information Science Society
- /
- v.7 no.2
- /
- pp.219-226
- /
- 1996
Basic objective in cluster analysis is to discover natural groupings of items or variables. In general, variable clustering was conducted based on some similarity measures between variables which have binary characteristics. We propose a variable clustering method when variables have more categories ordered in some sense. We also consider some measures of association as a similarity between variables. Numerical example is included.
PDF

A Variable Selection Procedure for K-Means Clustering

Kim, Sung-Soo
- The Korean Journal of Applied Statistics
- /
- v.25 no.3
- /
- pp.471-483
- /
- 2012
One of the most important problems in cluster analysis is the selection of variables that truly define cluster structure, while eliminating noisy variables that mask such structure. Brusco and Cradit (2001) present VS-KM(variable-selection heuristic for K-means clustering) procedure for selecting true variables for K-means clustering based on adjusted Rand index. This procedure starts with the fixed number of clusters in K-means and adds variables sequentially based on an adjusted Rand index. This paper presents an updated procedure combining the VS-KM with the automated K-means procedure provided by Kim (2009). This automated variable selection procedure for K-means clustering calculates the cluster number and initial cluster center whenever new variable is added and adds a variable based on adjusted Rand index. Simulation result indicates that the proposed procedure is very effective at selecting true variables and at eliminating noisy variables. Implemented program using R can be obtained on the website "http://faculty.knou.ac.kr/sskim/nvarkm.r and vnvarkm.r".
https://doi.org/10.5351/KJAS.2012.25.3.471 인용 PDF KSCI

Gene Expression Pattern Analysis via Latent Variable Models Coupled with Topographic Clustering

Chang, Jeong-Ho;Chi, Sung Wook;Zhang, Byoung Tak
- Genomics & Informatics
- /
- v.1 no.1
- /
- pp.32-39
- /
- 2003
We present a latent variable model-based approach to the analysis of gene expression patterns, coupled with topographic clustering. Aspect model, a latent variable model for dyadic data, is applied to extract latent patterns underlying complex variations of gene expression levels. Then a topographic clustering is performed to find coherent groups of genes, based on the extracted latent patterns as well as individual gene expression behaviors. Applied to cell cycleregulated genes of the yeast Saccharomyces cerevisiae, the proposed method could discover biologically meaningful patterns related with characteristic expression behavior in particular cell cycle phases. In addition, the display of the variation in the composition of these latent patterns on the cluster map provided more facilitated interpretation of the resulting cluster structure. From this, we argue that latent variable models, coupled with topographic clustering, are a promising tool for explorative analysis of gene expression data.
PDF KSCI

Pre-Adjustment of Incomplete Group Variable via K-Means Clustering

Hwang, S.Y.;Hahn, H.E.
- Journal of the Korean Data and Information Science Society
- /
- v.15 no.3
- /
- pp.555-563
- /
- 2004
In classification and discrimination, we often face with incomplete group variable arising typically from many missing values and/or incredible cases. This paper suggests the use of K-means clustering for pre-adjusting incompleteness and in turn classification based on generalized statistical distance is performed. For illustrating the proposed procedure, simulation study is conducted comparatively with CART in data mining and traditional techniques which are ignoring incompleteness of group variable. Simulation study manifests that our methodology out-performs.
PDF

Tree-structured Clustering for Continuous Data (연속형 자료에 대한 나무형 군집화)

Huh Myung-Hoe;Yang Kyung-Sook
- The Korean Journal of Applied Statistics
- /
- v.18 no.3
- /
- pp.661-671
- /
- 2005
The aim of this study is to propose a clustering method, called tree-structured clustering, by recursively partitioning continuous multivariate dat a based on overall $R^2$ criterion with a practical node-splitting decision rule. The clustering method produces easily interpretable clustering rules of tree types with the variable selection function. In numerical examples (Fisher's iris data and a Telecom case), we note several differences between tree-structured clustering and K-means clustering.
https://doi.org/10.5351/KJAS.2005.18.3.661 인용 PDF KSCI

Comparing Classification Accuracy of Ensemble and Clustering Algorithms Based on Taguchi Design (다구찌 디자인을 이용한 앙상블 및 군집분석 분류 성능 비교)

Shin, Hyung-Won;Sohn, So-Young
- Journal of Korean Institute of Industrial Engineers
- /
- v.27 no.1
- /
- pp.47-53
- /
- 2001
In this paper, we compare the classification performances of both ensemble and clustering algorithms (Data Bagging, Variable Selection Bagging, Parameter Combining, Clustering) to logistic regression in consideration of various characteristics of input data. Four factors used to simulate the logistic model are (1) correlation among input variables (2) variance of observation (3) training data size and (4) input-output function. In view of the unknown relationship between input and output function, we use a Taguchi design to improve the practicality of our study results by letting it as a noise factor. Experimental study results indicate the following: When the level of the variance is medium, Bagging & Parameter Combining performs worse than Logistic Regression, Variable Selection Bagging and Clustering. However, classification performances of Logistic Regression, Variable Selection Bagging, Bagging and Clustering are not significantly different when the variance of input data is either small or large. When there is strong correlation in input variables, Variable Selection Bagging outperforms both Logistic Regression and Parameter combining. In general, Parameter Combining algorithm appears to be the worst at our disappointment.
PDF

An Empirical Study on the Measurement of Clustering and Trend Analysis among the Asian Container Ports Using the Variable Group Benchmarking and Categorical Variable Models (가변 그룹 벤치마킹 모형과 범주형 변수모형을 이용한 아시아 컨테이너항만의 클러스터링측정 및 추세분석에 관한 실증적 연구)

Park, Rokyung
- Journal of Korea Port Economic Association
- /
- v.29 no.1
- /
- pp.143-175
- /
- 2013
The purpose of this paper is to show the clustering trend by using the variable group benchmarking(VGB) and categorical variable(CV) models for 38 Asian ports during 9 years(2001-2009) with 4 inputs(birth length, depth, total area, and number of crane) and 1 output(container TEU). The main empirical results of this paper are as follows. First, clustering results by using VGB show that Shanghai, Qingdao, and Ningbo ports took the core role for clustering. Second, CV analysis focusing on the container throughputs indicated that Singapore, Keelong, Dubai, and Kaosiung ports except Chinese ports are appeared as the center ports of clustering. Third, Aqaba, Dubai, Hongkong, Shanghai, Guangzhou, and Ningbo ports are recommended as the efficient ports for the target of clustering. Fourth, when the ports are classified by the regional location, Dubai, Khor Fakkan, Shanghai, Hongkong, Keelong, Ningbo, and Singapore ports are the core ports for clustering. On the whole, other ports located in Asia should be clustered to Dubai, Khor Fakkan, Shanghai, Hongkong, Ningbo, and Singapore ports. The policy implication of this paper is that Korean port policy planner should introduce the VGB model, and CV model for clustering among the international ports for enhancing the efficiency of inputs and outputs.
PDF KSCI

Tree-structured Clustering for Mixed Data (혼합형 데이터에 대한 나무형 군집화)

Yang Kyung-Sook;Huh Myung-Hoe
- The Korean Journal of Applied Statistics
- /
- v.19 no.2
- /
- pp.271-282
- /
- 2006
The aim of this study is to propose a tree-structured clustering for mixed data. We suggest a scaling method to reduce the variable selection bias among categorical variables. In numerical examples such as credit data, German credit data, we note several differences between tree-structured clustering and K-means clustering.
https://doi.org/10.5351/KJAS.2006.19.2.271 인용 PDF KSCI

Search Result 155, Processing Time 0.024 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)