Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 26, Issue 6 - Nov 2015
Volume 26, Issue 5 - Sep 2015
Volume 26, Issue 4 - Jul 2015
Volume 26, Issue 3 - May 2015
Volume 26, Issue 2 - Mar 2015
Volume 26, Issue 1 - Jan 2015
Selecting the target year
Long term trends in the Korean professional baseball
Lee, Jang Taek ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 1~10
DOI : 10.7465/jkdi.2015.26.1.1
This paper offers some long term perspective on what has been happening to some baseball statistics for Korean professional baseball. The data used are league summaries by year over the period 1982-2013. For the baseball statistics, statistically significant positive correlations (p < 0.01) were found for doubles (2B), runs batted in (RBI), bases on balls (BB), strike outs (SO), grounded into double play (GIDP), hit by pitch (HBP), on base percentage (OBP), OPS, earned run average (ERA), wild pitches (WP) and walks plus hits divided by innings pitched (WHIP) increased with year. There was a statistically significant decreasing trend in the correlations for triples (3B), caught stealing (CS), errors (E), completed games (CG), shutouts (SHO) and balks (BK) with year (trend p < 0.01). The ARIMA model of Box-Jenkins is applied to find a model to forecast future baseball measures. Univariate time series results suggest that simple lag-1 models fit some baseball measures quite well. In conclusion, the single most important change in Korean professional baseball is the overall incidence of completed games (CG) downward. Also the decrease of strike outs (SO) is very remarkable.
Various types of analysis of warranty returns data
Baik, Jaiwook ; Jo, Jinnam ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 11~19
DOI : 10.7465/jkdi.2015.26.1.11
A certain number of products are transported to be sold each month and some of them are returned for repair. In this study we first assume that the transported products are the ones that have been sold, Then nonparametric approach is applied to the warranty returns data to see how the reliability decreases over time. Parametric approach such as Weibull distribution is applied to the same data and the results for both nonparametric and parametric approaches are compared. Next we assume that there is a time lag between shipment and sale. Then both nonparametric and parametric approaches are applied to the time-lag data and the results are compared.
Influence of sociopsychological aspects, smoking habit, exercise habit on the intentions of drink-driving
Lee, Ki Hyeong ; Kwon, Yong Man ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 21~29
DOI : 10.7465/jkdi.2015.26.1.21
The purpose of this paper was to investigate various factors influencing the intentions of drink-driving from multiple perspectives, in order to uncover ways to reduce the number of motor accidents caused by drink-driving. We examined sociopsychological aspects as well as driver's life styles such as smoking habit and exercise habit. Perception of behaviour controls among drink-driver' sociopsychological characteristics had the highest influence on the intentions of drink-driving, followed by influence of smoking and exercise on the intentions of drink-driving. This finding indicates that driver' life style such as smoking habit or exercise habit influences more on the intentions of drink-driving than attitude toward drink-driving or subjective regulations, which affirms that driver' life style such as smoking habit or exercise habit has significant effects on the intentions of drink-driving. Therefore, it is concluded that rehabilitative curriculum for drink-drivers should include a program to diminish drink-driving through nonsmoking and exercise habit.
The analysis of random effects model by projections
Choi, Jaesung ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 31~39
DOI : 10.7465/jkdi.2015.26.1.31
This paper deals with a method for estimating variance components on the basis of projections under the assumption of random effects model. It discusses how to use projections for getting sums of squares to estimate variance components. The use of projections makes the vector subspace generated by the model matrix to be decomposed into subspaces that are orthogonal each other. To partition the vector space by the model matrix stepwise procedure is used. It is shown that the suggested method is useful for obtaining Type I sum of squares requisite for the ANOVA method.
A spectrum based evaluation algorithm for micro scale weather analysis module with application to time series cluster analysis
Kim, Hea-Jung ; Kwak, Hwa-Ryun ; Kim, Yu-Na ; Choi, Young-Jean ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 41~53
DOI : 10.7465/jkdi.2015.26.1.41
In meteorological field, many researchers have tried to develop micro scale weather analysis modules for providing real-time weather information service in the metropolitan area. This effort enables us to cope with various economic and social harms coming from serious change in the micro meteorology of a metropolitan area due to rapid urbanization such as quantitative expansions in its urban activity, growth of population, and building concentration. The accuracy of the micro scale weather analysis modules (MSWAM) directly related to usefulness and quality of the real-time weather information service in the metropolitan area. This paper design a evaluation system along with verification tools that sufficiently accommodate spatio-temporal characteristics of the outputs of the MSWAM. For this we proposes a test for the equality of mean vectors of the output series of the MSWAM and corresponding observed time series by using a spectral analysis technique. As a byproduct, a time series cluster analysis method, using a function of the test statistic as the distance measure, is developed. A real data application is given to demonstrate the utility of the method.
Brain laterality and whole brain EEG on the learning senses
Kwon, Hyungkyu ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 55~64
DOI : 10.7465/jkdi.2015.26.1.55
The present study identified the brain based learning activities on the individual learning senses by using the brain laterality and the whole brain index. Students receive the information through the visual, auditory, and kinesthetic senses by Politano and Paquin's (2000) classification. These learning senses are reflected on brain by the various combinations of senses for learning. Measuring the types of the learning senses involving in brain laterality and whole brain is required to figure out the related learning styles. Self-directed learning involved in the learning senses shows the problem-based learning associated to the brain function by emphasizing the balanced brain utilization which is known as whole brain. These research results showed the successful whole brain learning is closely associated with elevated auditory learning and elevated visual learning in sensorimotor brainwave rhythm (SMR) while it shows the close association with elevated kinesthetic and elevated visual learning in beta brainwave rhythm.
Determinants of employee's wage using hierarchical linear model
Park, Sungik ; Cho, Jangsik ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 65~75
DOI : 10.7465/jkdi.2015.26.1.65
This paper analyzes the determinants of wage for the college and university graduates utilizing both individual-level and industry-level variables. We note that wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and industry-level (level-2) variables. Then, the assumption that individual wage is independent in the classical regression is violated. Therefore, this paper utilizes the hierarchical linear model (HLM). The major results are the followings. First, the multiple correspondence analysis including level-1 and 2 variables reveals that both level 1 and level 2 variables affects individual wages judging from the fact that the values of level 1 and level 2 variables differ across the different level of individual wage groups. Second, the decision tree analysis including level-1 and 2 variables shows that the most influential variable in wage determination is industry-level wage and the next is industry-level working hour, ages and sex in the decling order in. This suggests that the utilization of the HLM is appropriate since the characteristics of industry is important in determining the individual wage. Third, it is shown that the HLM model is the best compared to the other models which do not take level-1 and level-2 variables simultaneously into account.
A redistribution model of the history-dependent Parrondo game
Jin, Geonjoo ; Lee, Jiyeon ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 77~87
DOI : 10.7465/jkdi.2015.26.1.77
Parrondo paradox is the counter-intuitive phenomenon where two losing games can be combined to win or two winning games can be combined to lose. In this paper, we consider an ensemble of players, one of whom is chosen randomly to play game A' or game B. In game A', the randomly chosen player transfers one unit of his capital to another randomly selected player. In game B, the player plays the history-dependent Parrondo game in which the winning probability of the present trial depends on the results of the last two trials in the past. We show that Parrondo paradox exists in this redistribution model of the history-dependent Parrondo game.
A study on the ordering of similarity measures with negative matches
Park, Hee Chang ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 89~99
DOI : 10.7465/jkdi.2015.26.1.89
The World Economic Forum and the Korean Ministry of Knowledge Economy have selected big data as one of the top 10 in core information technology. The key of big data is to analyze effectively the properties that do have data. Clustering analysis method of big data techniques is a method of assigning a set of objects into the clusters so that the objects in the same cluster are more similar to each other clusters. Similarity measures being used in the cluster analysis may be classified into various types depending on the nature of the data. In this paper, we studied upper and lower bounds for binary similarity measures with negative matches such as Russel and Rao measure, simple matching measure by Sokal and Michener, Rogers and Tanimoto measure, Sokal and Sneath measure, Hamann measure, and Baroni-Urbani and Buser mesures I, II. And the comparative studies with these measures were shown by real data and simulated experiment.
Small area estimations for disease mapping by using spatial model
An, Daeseong ; Han, Junhee ; Yoon, Taeho ; Kim, Changhoon ; Noh, Maengseok ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 101~109
DOI : 10.7465/jkdi.2015.26.1.101
SMRs (standardized mortality rates) for major diseases, accidents, cancer are considered in small areas of administrative units such as Eup/Myeon/Dong from years 2005 to 2008. Due to small sample issue in small areas, the precision of directly estimated crude SMR for each area can be low. In this study, we consider the HGLM (hierarchical generalized linear model) with MRF (Markov random field) to account for the spatial correlations among the small areas. The effects of covariates for cause of mortality by Dongs in Seoul and disease maps based on the estimated SMR are presented. The results suggest how we analyze and interpret the difference in mortalities by small areas such as Dongs by revealing the spatial patterns.
Efficiency analysis of the community welfare centers for people with disabilities using data envelopment analysis
Choi, Kyoungho ; Shin, Hyun-Uk ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 111~121
DOI : 10.7465/jkdi.2015.26.1.111
Until now, the operation of community welfare centers for people with disabilities has brought a positive or a generous awareness. Nevertheless, in order to obtain a wide range of welfare outcomes efficiently, the imperative step in rehabilitation researches is to determine whether reasonable and scientific services are being provided to people with disabilities in rehabilitation centers. The purpose of this study was to analyze efficiency and productivity of 176 community welfare centers for people with disabilities. As a result, average technical efficiency for community welfare centers for people with disabilities was 0.4488; pure technical efficiency and scale efficiency was 0.6040 and 0.7080, respectively. The major conclusions of this study were as follows. First, applying the technical efficiency analysis, DMU2, DMU3, DMU8, DMU9, DMU11, DMU13, DMU14 were shown above average. It seems to have a regard for political elements in accordance with the regional social and economic differences. Second, as a result of scale efficiency analysis, the inefficient community welfare centers for people with disabilities such as DMU1, DMU5, DMU12, DMU16 are required to improve the number of employees, revenue, facility area. Finally, this study is expected to be an effectiveness analysis and performance evaluation for the rehabilitation services.
The effects of the parent's socioeconomic status and the private education expenditure to the academic achievement
Yoo, Jiyeon ; Park, Changsoon ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 123~139
DOI : 10.7465/jkdi.2015.26.1.123
The purpose of this study is to analyze the effect of the parent's socioeconomic status to the academic achievement, together with the mediation effect of the private education expenditure. The structural equation modeling (SEM) method is used with the survey of private education expenditures data collected by Statistics Korea in 2011. In SEM, the multi-group effect is also analyzed for gender, region and school level. The analysis results show that the high socioeconomic status of parent tends to increase the private education expenditures but does not affect the academic achievement, and there are the significant multi-group effect for gender, region, and school level.
Model assessment with residual plot in logistic regression
Kahng, Myung Wook ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 141~150
DOI : 10.7465/jkdi.2015.26.1.141
Graphical paradigms for assessing the adequacy of models in logistic regression are discussed. The residual plot has been widely used as a graphical tool for evaluating the adequacy of the model. However, this approach works well only for linear models with constant variance, and the alternative approach, the marginal model plot, has its defects as well. We suggest a Chi-residual plot that overcomes the potential shortcomings of the marginal model plot.
Analysis of English abstracts in Journal of the Korean Data & Information Science Society using topic models and social network analysis
Kim, Gyuha ; Park, Cheolyong ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 151~159
DOI : 10.7465/jkdi.2015.26.1.151
This article analyzes English abstracts of the articles published in Journal of the Korean Data & Information Science Society using text mining techniques. At first, term-document matrices are formed by various methods and then visualized by social network analysis. LDA (latent Dirichlet allocation) and CTM (correlated topic model) are also employed in order to extract topics from the abstracts. Performances of the topic models are compared via entropy for several numbers of topics and weighting methods to form term-document matrices.
A transmission distribution estimation for real time Ebola virus disease epidemic model
Choi, Ilsu ; Rhee, Sung-Suk ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 161~168
DOI : 10.7465/jkdi.2015.26.1.161
The epidemic is seemed to be extremely difficult for accurate predictions. The new models have been suggested that show quite different results. The basic reproductive number of epidemic for consequent time intervals are estimated based on stochastic processes. In this paper, we proposed a transmission distribution estimation for Ebola virus disease epidemic model. This estimation can be easier to obtain in real time which is useful for informing an appropriate public health response to the outbreak. Finally, we implement our proposed method with data from Guinea Ebola disease outbreak.
Comparison of satisfaction and need on nursing service perceived by the patients and nurses
Lee, Nae Young ; Han, Ji Young ; Heo, Mi Jin ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 169~177
DOI : 10.7465/jkdi.2015.26.1.169
This study evaluates the need of nursing (NN) and satisfaction on nursing service (SNS) in patients and nurses. Questionnaires were completed by 105 patients and 105 nurses in one hospital. The mean score of NN was
for patients and
for nurses (t=9.23, p<.001). The top score came from cure territory, while the lowest from physical territory in both patients and nurses. The mean score of SNS was
for patients and
for nurses (t=3.88, p<.001). The top score came from cure territory, while the lowest from physical territory in both patients and nurses. When NN and SNS are compared, the score of NN was higher than that of SNS in both patients (t=3.77, p<.001) and nurses (t=9.23, p<.001). As a result, they provided unsatisfactory nursing services, although nurses worked hard to improve them. Nurse administrators should develop strategies and apply them.
The diffusion and policy options of the diagnostic imaging technologies in Korea
Choi, Yoon Jung ; Kwak, Minjung ; Yoon, Min ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 179~185
DOI : 10.7465/jkdi.2015.26.1.179
The cost of advanced medical technologies is commonly considered to be a major factor in the overall escalation of expenditures on health. The use of computed tomography (CT) scanning has increased dramatically over the past decade. CT has been rapidly adopted, despite their high cost. The aim of this study is to analysis the increasing factor of the frequency of the CT, using the decision tree model. Finally, we propose the effective policy option of diagnostic imaging technology in Korea.
Performances analysis of football matches
Min, Dae Kee ; Lee, Young-Soo ; Kim, Yong-Rae ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 187~196
DOI : 10.7465/jkdi.2015.26.1.187
The team's performances were analyzed by evaluating the scores gained by their offense and the scores allowed by their defense. To evaluate the team's attacking and defending abilities, we also considered the factors that contributed the team's gained points or the opposing team's gained points? In order to analyze the outcome of the games, three prediction models were used such as decision trees, logistic regression, and discriminant analysis. As a result, the factors associated with the defense showed a decisive influence in determining the game results. We analyzed the offense and defense by using the response variable. This showed that the major factors predicting the offense were non-stop pass and attack speed and the major factor predicting the defense were the distance between right and left players and the distance between front line attackers and rearmost defenders during the game.
Bayesian estimation of the Korea professional baseball players' hitting ability based on the batting average
Cho, Yong Ju ; Lee, Kwang Ho ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 197~207
DOI : 10.7465/jkdi.2015.26.1.197
In baseball game, the hitting ability of batter is frequently assessed by a batting average, a run batted in, a home run, a run scored, an on-base percentage, etc. Recently, more comprehensive indicators such as OPS, ISO, SECA, TA, RC and XR are often used. But, these measures generally shows large deviations since they are calculated from the data for a certain period of time, and they are not an estimate of a population parameter, either. In this paper, we will presume the pure hitting ability of the korea professional baseball players as a parameter which is depend upon at bat. We will estimate the parameter by using the Bayesian method.
Partially linear support vector orthogonal quantile regression with measurement errors
Hwang, Changha ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 209~216
DOI : 10.7465/jkdi.2015.26.1.209
Quantile regression models with covariate measurement errors have received a great deal of attention in both the theoretical and the applied statistical literature. A lot of effort has been devoted to develop effective estimation methods for such quantile regression models. In this paper we propose the partially linear support vector orthogonal quantile regression model in the presence of covariate measurement errors. We also provide a generalized approximate cross-validation method for choosing the hyperparameters and the ratios of the error variances which affect the performance of the proposed model. The proposed model is evaluated through simulations.
Performance study of propensity score methods against regression with covariate adjustment
Park, Jincheol ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 217~227
DOI : 10.7465/jkdi.2015.26.1.217
In observational study, handling confounders is a primary issue in measuring treatment effect of interest. Historically, a regression with covariate adjustment (covariate-adjusted regression) has been the typical approach to estimate treatment effect incorporating potential confounders into model. However, ever since the introduction of the propensity score, covariate-adjusted regression has been gradually replaced in medical literatures with various balancing methods based on propensity score. On the other hand, there is only a paucity of researches assessing propensity score methods compared with the covariate-adjusted regression. This paper examined the performance of propensity score methods in estimating risk difference and compare their performance with the covariate-adjusted regression by a Monte Carlo study. The study demonstrated in general the covariate-adjusted regression with variable selection procedure outperformed propensity-score-based methods in terms both of bias and MSE, suggesting that the classical regression method needs to be considered, rather than the propensity score methods, if a performance is a primary concern.
VIP-targeted CRM strategies in an open market
Lee, Hanjun ; Shim, Beomsoo ; Suh, Yongmoo ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 229~241
DOI : 10.7465/jkdi.2015.26.1.229
Nowadays, an open-market which provides sellers and consumers a cyber place for making a transaction over the Internet has emerged as a prevalent sales channel because of convenience and relatively low price it provides. However, there are few studies about CRM strategies based on VIP consumers for an open-market even though understanding VIP consumers' behaviors in open-markets is important to increase its revenue. Therefore, we propose CRM strategies targeted on VIP customers, obtained by analyzing the transaction data of VIP customers from an open-market using data mining techniques. To that end, we first defined the VIP customers in terms of recency, frequency and monetary (RFM) values. Then, we used data mining techniques to develop a model which best classifies and identifies infiluential factors customers into VIPs or non-VIPs. We also validate each of promotion types in the aspect of effectiveness and identify association rules among the types. Then, based on the findings from these experiments, we propose strategies from the perspectives of CRM dimensions for the open-market to thrive.
Noninformative priors for the common shape parameter of several inverse Gaussian distributions
Kang, Sang Gil ; Kim, Dal Ho ; Lee, Woo Dong ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 243~253
DOI : 10.7465/jkdi.2015.26.1.243
In this paper, we develop the noninformative priors for the common shape parameter of several inverse Gaussian distributions. Specially, we want to develop noninformative priors which satisfy certain objective criterion. The probability matching priors and reference priors of the common shape parameter will be developed. It turns out that the second order matching prior does not exist. The reference priors satisfy the first order matching criterion, but Jeffrey's prior is not the first order matching prior. We showed that the proposed reference prior matches the target coverage probabilities in a frequentist sense through simulation study, and an example based on real data is given.
Two optimal threshold criteria for ROC analysis
Cho, Min Ho ; Hong, Chong Sun ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 255~260
DOI : 10.7465/jkdi.2015.26.1.255
Among many optimal threshold criteria from ROC curve, the closest-to-(0,1) and amended closest-to-(0,1) criteria are considered. An ROC curve that passes close to the (0,1) point indicates that two models are well classified. In this case, the ROC curve is located far from the (1,0) point. Hence we propose two criteria: the farthest-to-(1,0) and amended farthest-to-(1,0) criteria. These criteria are found to have a relationship with the KolmogorovSmirnov statistic as well as some optimal threshold criteria. Moreover, we derive that a definition for the proposed criteria with more than two dimensions and with relations to multi-dimensional optimal threshold criteria.
Comparative analysis of Bayesian and maximum likelihood estimators in change point problems with Poisson process
Kitabo, Cheru Atsmegiorgis ; Kim, Jong Tae ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 261~269
DOI : 10.7465/jkdi.2015.26.1.261
Nowadays the application of change point analysis has been indispensable in a wide range of areas such as quality control, finance, environmetrics, medicine, geographics, and engineering. Identification of times where process changes would help minimize the consequences that might happen afterwards. The main objective of this paper is to compare the change-point detection capabilities of Bayesian estimate and maximum likelihood estimate. We applied Bayesian and maximum likelihood techniques to formulate change points having a step change and multiple number of change points in a Poisson rate. After a signal from c-chart and Poisson cumulative sum control charts have been detected, Monte Carlo simulation has been applied to investigate the performance of Bayesian and maximum likelihood estimation. Change point detection capacities of Bayesian and maximum likelihood estimation techniques have been investigated through simulation. It has been found that the Bayesian estimates outperforms standard control charts well specially when there exists a small to medium size of step change. Moreover, it performs convincingly well in comparison with the maximum like-lihood estimator and remains good choice specially in confidence interval statistical inference.
Optimal three step-stress accelerated life tests for Type-I hybrid censored data
Moon, Gyoung Ae ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 271~280
DOI : 10.7465/jkdi.2015.26.1.271
In this paper, the maximum likelihood estimators for parameters are derived under three step-stress accelerated life tests for Type-I hybrid censored data. The exponential distribution and the cumulative exposure model are considered based on the assumption that a log quadratic relationship exits between stress and the mean lifetime
. The test plan to search optimal stress change times minimizing the asymptotic variance of maximum likelihood estimators are presented. A numerical example to illustrate the proposed inferential procedures and some simulation results to investigate the sensitivity of the optimal stress change times by the guessed parameters are given.
Three level constant stress accelerated life tests for Weibull distribution
Moon, Gyoung Ae ;
Journal of the Korean Data and Information Science Society, volume 26, issue 1, 2015, Pages 281~288
DOI : 10.7465/jkdi.2015.26.1.281
In this paper, the maximum likelihood estimators and confidence intervals for parameters of Weibull distribution are derived under three level constant stress accelerated life tests and the assumption that a log quadratic relationship exits between stress and the scale parameter
. The compound linear plan proposed by Kim (2006) is used to allocate the test units at each stress level, which performed nearly as good as the optimum quadratic plan and had the advantage of simplicity. Some simulation studies are given.