Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 25, Issue 6 - Nov 2014
Volume 25, Issue 5 - Sep 2014
Volume 25, Issue 4 - Jul 2014
Volume 25, Issue 3 - May 2014
Volume 25, Issue 2 - Mar 2014
Volume 25, Issue 1 - Jan 2014
Selecting the target year
A study on the spread of the foot-and-mouth disease in Korea in 2010/2011
Hwang, Jihyun ; Oh, Changhyuck ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 271~280
DOI : 10.7465/jkdi.2014.25.2.271
Foot-and-mouth Disease (FMD) is a highly infectious and fatal viral livestock disease that affects cloven-hoofed animals domestic and wild and the FMD outbreak in Korea in 2010/2011 was a disastrous incident for the country and the economy. Thus, efforts at the national level are put to prevent foot-and-mouth disease and to reduce the damage in the case of outbreak. As one of these efforts, it is useful to study the spread of the disease by using probabilistic model. In fact, after the FMD epidemic in the UK occurred in 2001, many studies have been carried on the spread of the disease using a variety of stochastic models as an effort to prepare future outbreak of FMD. However, for the FMD outbreak in Korea occurred in 2010/2011, there are few study by utilizing probabilistic model. This paper assumes a stochastic spatial-temporal susceptible-infectious-removed (SIR) epidemic model for the 2010/2011 FMD outbreak to understand spread of the disease. Since data on infections of FMD disease during 2010/2011 outbreak of Aniaml and Plant Quarantine Agency and on the livestock farms from the nationwide census in 2011 of Statistics Korea do not have detail informations on address or missing values, we generate detail information on address by randomly allocating farms within corresponding Si/Gun area. The kernel function is estimated using the infection data and by using simulations, the susceptibility and transmission of the spatial-temporal stochastic SIR models are determined.
Using genetic algorithm to optimize rough set strategy in KOSPI200 futures market
Chung, Seung Hwan ; Oh, Kyong Joo ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 281~292
DOI : 10.7465/jkdi.2014.25.2.281
As the importance of algorithm trading is getting stronger, researches for artificial intelligence (AI) based trading strategy is also being more important. However, there are not enough studies about using more than two AI methodologies in one trading system. The main aim of this study is development of algorithm trading strategy based on the rough set theory that is one of rule-based AI methodologies. Especially, this study used genetic algorithm for optimizing profit of rough set based strategy rule. The most important contribution of this study is proposing efficient convergence of two different AI methodology in algorithm trading system. Target of purposed trading system is KOPSI200 futures market. In empirical study, we prove that purposed trading system earns significant profit from 2009 to 2012. Moreover, our system is evaluated higher shape ratio than buy-and-hold strategy.
The influence of parents` child abuse, school violence and friends attachment on mental health in childhood
Min, Dae Kee ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 293~304
DOI : 10.7465/jkdi.2014.25.2.293
A child`s mental health is an important element of his proper emotional development. Abuse of children by parents and peer groups are causes of depression and anxiety in children. These conditions become obstacles to their normal growth process which can be a contributing factor to juvenile delinquency. This study is based in the theoretical background of the relationship between abuse from parents and peer groups and children`s emotional health. This information is analyzed through structural equation modeling.
A linearity test statistic in a simple linear regression
Park, Chun Gun ; Lee, Kyeong Eun ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 305~315
DOI : 10.7465/jkdi.2014.25.2.305
In a simple linear regression, a linear relationship between an explanatory variable and a response variable can be easily recognized in the scatter plot of them. The lack of fit test for the replicated data is commonly used for testing the linearity but it is not easy to test the linearity when the explanatory variable is not replicated. In this paper, we propose three new test statistics for testing the linearity regardless of replication using the principle of average slope and validate them through several simulations and empirical studies.
Network analysis and comparing citation index of statistics journals
Won, Dongkee ; Choi, Kyoungho ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 317~325
DOI : 10.7465/jkdi.2014.25.2.317
Evaluating contents and quality of the journal along with the research ability of researcher is becoming an important issue recently. This research compared level of impact of the journals related to statistics in nation, `Journal of the Korean Data & Information Science Society`-centric, using various KCI citation index. Moreover, this research surveyed network between the journals in the aspect of social network analysis, using co-citation frequency. From that, the following conclusions were drawn. First, percentage of self-citation was relatively high. Second, even though Statistics journal had higher impact index than the mathematics, physics and chemistry, frequency of citing statistics journal in other journals was not that high. Third, `Journal of the Korean Data & Information Science Society` serves central role in network analysis, however it seems that more efforts are required.
Statistical analysis of recurrent gap time events with incomplete observation gaps
Shin, Seul Bi ; Kim, Yang Jin ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 327~336
DOI : 10.7465/jkdi.2014.25.2.327
Recurrent event data occurs when a subject experiences same type of event repeatedly and is found in various areas such as the social sciences, Economics, medicine and public health. To analyze recurrent event data either a total time or a gap time is adopted according to research interest. In this paper, we analyze recurrent event data with incomplete observation gap using a gap time scale. That is, some subjects leave temporarily from a study and return after a while. But it is not available when the observation gaps terminate. We adopt an interval censoring mechanism for estimating the termination time. Furthermore, to model the association among gap times of a subject, a frailty effect is incorporated into a model. Programs included in Survival package of R program are implemented to estimate the covariate effect as well as the variance of frailty effect. YTOP (Young Traffic Offenders Program) data is analyzed with both proportional hazard model and a weibull regression model.
Development of quality of life with WHOQOL-HIV BREF Korean version among HIV patients in Korea
Lee, Won Kee ; Kim, Shin-Woo ; Kim, Hye-In ; Chang, Hyun-Ha ; Lee, Jong-Myung ; Kim, Yoon-Joo ; Lee, Mi-Young ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 337~347
DOI : 10.7465/jkdi.2014.25.2.337
There is no known publication about assessment of quality of life (QOL) in Korean HIV patients. We aimed to assess the QOL of HIV patients. We developed Korean version of the WHOQOL-HIV BREF (short forms of WHOQOL-HIV, 31 questions with 6 domains). Survey data from 220 HIV-positive adults were obtained in 14 centers in South Korea. Male were dominant (202/220, 91.8%). Mean age was
. Mean CD4+ T-cell count was
. Overall of WHOQOL-HIV BREF were
Measurements for hitting ability in the Korean pro-baseball
Lee, Jang Taek ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 349~356
DOI : 10.7465/jkdi.2014.25.2.349
In baseball, sabermetric batting statistics are used to compare an offensive performance of players. There exist dozens of sabermetric statistics, but baseball fans don`t like the complexity of an abundance of measures. This paper provides a batting grade index (BGI) using principal component based on eight batting statistics. These are OPS, ISO, SECA, TA, RC, RC/27, wOBA and XR. We show that how standardized batting statistics are aggregated and weighted to arrive at a single composite measure of BGI. Also our result allows for segmentation of players into groups using the K-means clustering algorithm.
Estimation of OBP coefficient in Korean professional baseball
Lee, Jang Taek ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 357~363
DOI : 10.7465/jkdi.2014.25.2.357
OPS is a sabermetric baseball statistic calculated as the sum of a player`s on base percentage (OBP) and slugging percentage (SLG). One of the frequently cited problem with OPS is that OPS gives equal weight to its two components, OBP and SLG. In fact, OBP contributes significantly more to scoring runs than SLG does. This paper provides some exploration into the correct weighting of OBP to SLG when adding the two together. By correlating different coefficients of OBP to runs scored per game, the weighted OPS that weighting OBP 56% in two place more than SLG produced the highest correlation. We found that the weight of OBP increases as RPG increases. Also we suggest the linear regression equation of the best OBP coefficient against RPG.
Comparison of confidence measures useful for classification model building
Park, Hee Chang ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 365~371
DOI : 10.7465/jkdi.2014.25.2.365
Association rule of the well-studied techniques in data mining is the exploratory data analysis for understanding the relevance among the items in a huge database. This method has been used to find the relationship between each set of items based on the interestingness measures such as support, confidence, lift, similarity measures, etc. By typical association rule technique, we generate association rule that satisfy minimum support and confidence values. Support and confidence are the most frequently used, but they have the drawback that they can not determine the direction of the association because they have always positive values. In this paper, we compared support, basic confidence, and three kinds of confidence measures useful for classification model building to overcome this problem. The result confirmed that the causal confirmed confidence was the best confidence in view of the association mining because it showed more precisely the direction of association.
Identification of major risk factors association with respiratory diseases by data mining
Lee, Jea-Young ; Kim, Hyun-Ji ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 373~384
DOI : 10.7465/jkdi.2014.25.2.373
Data mining is to clarify pattern or correlation of mass data of complicated structure and to predict the diverse outcomes. This technique is used in the fields of finance, telecommunication, circulation, medicine and so on. In this paper, we selected risk factors of respiratory diseases in the field of medicine. The data we used was divided into respiratory diseases group and health group from the Gyeongsangbuk-do database of Community Health Survey conducted in 2012. In order to select major risk factors, we applied data mining techniques such as neural network, logistic regression, Bayesian network, C5.0 and CART. We divided total data into training and testing data, and applied model which was designed by training data to testing data. By the comparison of prediction accuracy, CART was identified as best model. Depression, smoking and stress were proved as the major risk factors of respiratory disease.
A sign test for random walk hypothesis based on slopes
Kim, Tae Yoon ; Park, Cheolyong ; Kim, Seul Gee ; Kim, Chan Jin ; Kim, Hyun ; Yu, Ju Hyung ; Jang, Kyung Min ; Jang, Young Seok ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 385~392
DOI : 10.7465/jkdi.2014.25.2.385
Random walk hypothesis is a hypothesis that explains theoretically the difficulty in forecasting in financial market. Various tests for the hypothesis have been developed so far but it is known that those tests suffer from low power and size distortion. In this article, a sign test based on slopes are suggested to overcome these difficulties. A simulation study is conducted to compare this test to the often used Dickey and Fuller (1979) test.
The influence of the random censorship model on the estimation of the scale parameter of the exponential distribution
Kim, Namhyun ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 393~402
DOI : 10.7465/jkdi.2014.25.2.393
The simplest and the most important distribution in survival analysis is the exponential distribution. In this paper, we investigate the influence of the random censorship model on the estimation of the scale parameter of the exponential distribution. The considered random censorship models are Koziol-Green model and the generalized exponential distribution model. Two models have different meanings. Through the simulation study, the averages of the estimated values of the parameter do not show big differences, however the MSE of the estimator tends to be bigger when the supposed model is significantly different from the true model.
An educational tool for binary logistic regression model using Excel VBA
Park, Cheolyong ; Choi, Hyun Seok ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 403~410
DOI : 10.7465/jkdi.2014.25.2.403
Binary logistic regression analysis is a statistical technique that explains binary response variable by quantitative or qualitative explanatory variables. In the binary logistic regression model, the probability that the response variable equals, say 1, one of the binary values is to be explained as a transformation of linear combination of explanatory variables. This is one of big barriers that non-statisticians have to overcome in order to understand the model. In this study, an educational tool is developed that explains the need of the binary logistic regression analysis using Excel VBA. More precisely, this tool explains the problems related to modeling the probability of the response variable equal to 1 as a linear combination of explanatory variables and then shows how these problems can be solved through some transformations of the linear combination.
A study on academic achievement by gender and selection method based on latent growth model: K university case
Choi, Hyun Seok ; Park, Cheolyong ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 411~422
DOI : 10.7465/jkdi.2014.25.2.411
This study analyzed how average GPA (grade point average) changes as the number of completed semesters increases based on the estimates of intercept, slope, and quadratic term. The students included in this study are those who was admitted in 2011 and took 6 consecutive semesters. More precisely, it was analyzed if intercept, slope and quadratic term of average GPA were different between gender and selection method. The results showed that the intercept was different between selection method, the slope was different between gender, but the quadratic term was different between neither selection method nor gender.
Type I projection sum of squares by weighted least squares
Choi, Jaesung ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 423~429
DOI : 10.7465/jkdi.2014.25.2.423
This paper discusses a method for getting Type I sums of squares by projections under a two-way fixed-effects model when variances of errors are not equal. The method of weighted least squares is used to estimate the parameters of the assumed model. The model is fitted to the data in a sequential manner by using the model comparison technique. The vector space generated by the model matrix can be composed of orthogonal vector subspaces spanned by submatrices consisting of column vectors related to the parameters. It is discussed how to get the Type I sums of squares by using the projections into the orthogonal vector subspaces.
Estimation for the Rayleigh distribution based on Type I hybrid censored sample
Kwon, Byongwon ; Lee, Kyeongjun ; Cho, Youngseuk ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 431~438
DOI : 10.7465/jkdi.2014.25.2.431
Type I hybrid censoring scheme is the combination of the Type I and Type II censoring scheme introduced by Epstein (1954). Epstein considered a hybrid censoring sampling scheme in which the life testing experiment is terminated at a random time
which is the time that happens rst among the following two; time of the kth unit is observed or time of the experiment length set in advance. The likelihood function of this scheme from the Rayleigh distribution cannot be solved in a explicit solution and thus we approximate the function by the Taylor series expansion. In this process, we propose four dierent methods of expansion skill.
Quantile regression with errors in variables
Shim, Jooyong ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 439~446
DOI : 10.7465/jkdi.2014.25.2.439
Quantile regression models with errors in variables have received a great deal of attention in the social and natural sciences. Some eorts have been devoted to develop eective estimation methods for such quantile regression models. In this paper we propose an orthogonal distance quantile regression model that eectively considers the errors on both input and response variables. The performance of the proposed method is evaluated through simulation studies.
Semi-supervised regression based on support vector machine
Seok, Kyungha ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 447~454
DOI : 10.7465/jkdi.2014.25.2.447
In many practical machine learning and data mining applications, unlabeled training examples are readily available but labeled ones are fairly expensive to obtain. Therefore semi-supervised learning algorithms have attracted much attentions. However, previous research mainly focuses on classication problems. In this paper, a semi-supervised regression method based on support vector regression (SVR) formulation that is proposed. The estimator is easily obtained via the dual formulation of the optimization problem. The experimental results with simulated and real data suggest superior performance of the our proposed method compared with standard SVR.
A transductive least squares support vector machine with the difference convex algorithm
Shim, Jooyong ; Seok, Kyungha ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 455~464
DOI : 10.7465/jkdi.2014.25.2.455
Unlabeled examples are easier and less expensive to obtain than labeled examples. Semisupervised approaches are used to utilize such examples in an eort to boost the predictive performance. This paper proposes a novel semisupervised classication method named transductive least squares support vector machine (TLS-SVM), which is based on the least squares support vector machine. The proposed method utilizes the dierence convex algorithm to derive nonconvex minimization solutions for the TLS-SVM. A generalized cross validation method is also developed to choose the hyperparameters that aect the performance of the TLS-SVM. The experimental results conrm the successful performance of the proposed TLS-SVM.
Default Bayesian hypothesis testing for the scale parameters in the half logistic distributions
Kang, Sang Gil ; Kim, Dal Ho ; Lee, Woo Dong ;
Journal of the Korean Data and Information Science Society, volume 25, issue 2, 2014, Pages 465~472
DOI : 10.7465/jkdi.2014.25.2.465
This article deals with the problem of testing the equality of the scale parameters in the half logistic distributions. We propose Bayesian hypothesis testing procedures for the equality of the scale parameters under the noninformative priors. The noninformative prior is usually improper which yields a calibration problem that makes the Bayes factor to be dened up to a multiplicative constant. Thus we propose the default Bayesian hypothesis testing procedures based on the fractional Bayes factor and the intrinsic Bayes factors under the reference priors. Simulation study and an example are provided.