Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Korean Journal of Applied Statistics
Journal Basic Information
Journal DOI :
The Korean Statistical Society
Editor in Chief :
Volume & Issues
Volume 28, Issue 6 - Dec 2015
Volume 28, Issue 5 - Oct 2015
Volume 28, Issue 4 - Aug 2015
Volume 28, Issue 3 - Jun 2015
Volume 28, Issue 2 - Apr 2015
Volume 28, Issue 1 - Feb 2015
Selecting the target year
Network Classification of P2P Traffic with Various Classification Methods
Han, Seokwan ; Hwang, Jinsoo ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 1~8
DOI : 10.5351/KJAS.2015.28.1.001
Security has become an issue due to the rapid increases in internet traffic data network. Especially P2P traffic data poses a great challenge to network systems administrators. Preemptive measures are necessary for network quality of service(QoS) and efficient resource management like blocking suspicious traffic data. Deep packet inspection(DPI) is the most exact way to detect an intrusion but it may pose a private security problem that requires time. We used several machine learning methods to compare the performance in classifying network traffic data accurately over time. The Random Forest method shows an excellent performance in both accuracy and time.
-Norm Support Vector Machine for the Classification of Highly Imbalanced Data
Kim, Eunkyung ; Jhun, Myoungshic ; Bang, Sungwan ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 9~21
DOI : 10.5351/KJAS.2015.28.1.009
The support vector machine has been successfully applied to various classification areas due to its flexibility and a high level of classification accuracy. However, when analyzing imbalanced data with uneven class sizes, the classification accuracy of SVM may drop significantly in predicting minority class because the SVM classifiers are undesirably biased toward the majority class. The weighted
-norm SVM was developed for the analysis of imbalanced data; however, it cannot identify irrelevant input variables due to the characteristics of the ridge penalty. Therefore, we propose the weighted
-norm SVM, which uses lasso penalty to select important input variables and weights to differentiate the misclassification of data points between classes. We demonstrate the satisfactory performance of the proposed method through simulation studies and a real data analysis.
On the Use of Weighted k-Nearest Neighbors for Missing Value Imputation
Lim, Chanhui ; Kim, Dongjae ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 23~31
DOI : 10.5351/KJAS.2015.28.1.023
A conventional missing value problem in the statistical analysis k-Nearest Neighbor(KNN) method are used for a simple imputation method. When one of the k-nearest neighbors is an extreme value or outlier, the KNN method can create a bias. In this paper, we propose a Weighted k-Nearest Neighbors(WKNN) imputation method that can supplement KNN`s faults. A Monte-Carlo simulation study is also adapted to compare the WKNN method and KNN method using real data set.
Approximation on the Distribution of the Overshoot by the Property of Erlang Distribution in the M/E
Lee, Sang-Gi ; Bae, Jongho ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 33~47
DOI : 10.5351/KJAS.2015.28.1.033
We consider an
queueing model where customers arrive at a facility with a single server according to a Poisson process with customer service times assumed to be independent and identically distributed with Erlang distribution. We concentrate on the overshoot of the workload process in the queue. The overshoot means the excess over a threshold at the moment where the workload process exceeds the threshold. The approximation of the distribution of the overshoot was proposed by Bae et al. (2011); however, but the accuracy of the approximation was unsatisfactory. We derive an advanced approximation using the property of the Erlang distribution. Finally the newly proposed approximation is compared with the results of the previous study.
A Comparison Study for Ordination Methods in Ecology
Ko, Hyeon-Seok ; Jhun, Myoungshic ; Jeong, Hyeong Chul ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 49~60
DOI : 10.5351/KJAS.2015.28.1.049
Various kinds of ordination methods such as correspondence analysis and canonical correspondence analysis are used in community ecology to visualize relationships among species, sites, and environmental variables. Ter Braak (1986), Jackson and Somers (1991), Parmer (1993), compared the ordination methods using eigenvalue and distance graph. However, these methods did not show the relationship between population and biplot because they are only based on surveyed data. In this paper, a method that measures the extent to show population information to biplot was introduced to compare ordination methods objectively.
A Stagewise Approach to Structural Equation Modeling
Lee, Bora ; Park, Changsoon ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 61~74
DOI : 10.5351/KJAS.2015.28.1.061
Structural equation modeling (SEM) is a widely used in social sciences such as education, business administration, and psychology. In SEM, the latent variable score is the estimate of the latent variable which cannot be observed directly. This study uses stagewise structural equation modeling(stagewise SEM; SSEM) by partitioning the whole model into several stages. The traditional estimation method minimizes the discrepancy function using the variance-covariance of all observed variables. This method can lead to inappropriate situations where exogenous latent variables may be affected by endogenous latent variables. The SSEM approach can avoid such situations and reduce the complexity of the whole SEM in estimating parameters.
Confidence Interval for Sensitive Binomial Attribute : Direct Question Method and Indirect Question Method
Ryu, Jea-Bok ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 75~82
DOI : 10.5351/KJAS.2015.28.1.075
We discuss confidence intervals for sensitive binomial attributes obtained by a direct question method and indirect question method. The Randomized Response Technique(RRT) by Warner (1965) is an indirect question method that uses a randomization device to reduce the response burden of respondents. We used the mean coverage probability (MCP), root mean squared error (RMSE), and mean expected width (MEW) to compare the confidence intervals by the two methods. The numerical comparisons indicated found that the MEW of RRT is too large and the RRT is so conservative that the MCP exceeds a nominal level(
); therefore, it is necessary to complement these problem in order to increase the utility of the indirect question method.
Particulate Matter Prediction using Quantile Boosting
Kwon, Jun-Hyeon ; Lim, Yaeji ; Oh, Hee-Seok ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 83~92
DOI : 10.5351/KJAS.2015.28.1.083
Concerning the national health, it is important to develop an accurate prediction method of atmospheric particulate matter (PM) because being exposed to such fine dust can trigger not only respiratory diseases as well as dermatoses, ophthalmopathies and cardiovascular diseases. The National Institute of Environmental Research (NIER) employs a decision tree to predict bad weather days with a high PM concentration. However, the decision tree method (even with the inherent unstableness) cannot be a suitable model to predict bad weather days which represent only 4% of the entire data. In this paper, while presenting the inaccuracy and inappropriateness of the method used by the NIER, we present the utility of a new prediction model which adopts boosting with quantile loss functions. We evaluate the performance of the new method over various
-value`s and justify the proposed method through comparison.
Two-Stage Experimental Design for Multiple Objectives
Jang, Dae-Heung ; Kim, Youngil ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 93~102
DOI : 10.5351/KJAS.2015.28.1.093
The D-optimal design for the nonlinear model typically depends on the unknown parameters to be estimated. Therefore, it is strongly recommended in literature to use a sequential experimental design for estimating the parameters. In this paper two stage experimental design is discussed under many different circumstances including estimating parameters. The method is so universal to be applied to any mixture of objectives for any model including linear model. A hybrid approach is suggested to handle more than 2 objectives in two-stage experimental design. The design is discussed in approximate design framework.
A Study on Small Business Forecasting Models and Indexes
Yoon, YeoChang ; Lee, Sung Duck ; Sung, JaeHyun ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 103~114
DOI : 10.5351/KJAS.2015.28.1.103
The role of small and medium enterprises as an economic growth factor has been accentuated; consequently, the need to develop a business forecast model and indexes that accurately examine business situation of small and medium enterprises has increased. Most current business model and indexes concerning small and medium enterprises, released by public and private institutions, are based on Business Survey Index (BSI) and depend on subjective (business model and) indexes; therefore, the business model and indexes lack a capacity to grasp an accurate business situation of these enterprises. The business forecast model and indexes suggested in the study have been newly developed with Principal Component Analysis(PCA) and weight method to accurately measure a business situation based on reference dates addressed by the National Statistical Office(NSO). Empirical studies will be presented to prove that the newly proposed business model and indexes have their basis in statistical theory and their trend that resembles the existing Composite Index.
Integer-Valued GARCH Models for Count Time Series: Case Study
Yoon, J.E. ; Hwang, S.Y. ;
Korean Journal of Applied Statistics, volume 28, issue 1, 2015, Pages 115~122
DOI : 10.5351/KJAS.2015.28.1.115
This article is concerned with count time series taking values in non-negative integers. Along with the first order mean of the count time series, conditional variance (volatility) has recently been paid attention to and therefore various integer-valued GARCH(generalized autoregressive conditional heteroscedasticity) models have been suggested in the last decade. We introduce diverse integer-valued GARCH(INGARCH, for short) processes to count time series and a real data application is illustrated as a case study. In addition, zero inflated INGARCH models are discussed to accommodate zero-inflated count time series.