Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 27, Issue 4 - Jul 2016
Volume 27, Issue 3 - May 2016
Volume 27, Issue 2 - Mar 2016
Volume 27, Issue 1 - Jan 2016
Selecting the target year
A statistical prediction for concentrations of Manganese in the ambient air
Kwon, Hye Ji ; Kim, Yongku ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 577~586
DOI : 10.7465/jkdi.2016.27.3.577
Hazardous air pollution caused by heavy metals in the air is at a serious level. Although manganese(Mn), one of the heavy metals, is a non-carcinogenic substance, it has a harmful influence on the human body. It is partially measured because automatic monitoring technologies have not yet be fully established. We introduced a statistical model for the daily concentration of manganese. Incorporating a linkage between Mn and meteorology, the proposed model is formulated in way to identify meteorological effects and to allow for seasonal trends, enabling not only accurate measurement of manganese concentration, but also information about the evaluation on a Hazard Quotient (non-cancer risk).
An estimation method for non-response model using Monte-Carlo expectation-maximization algorithm
Choi, Boseung ; You, Hyeon Sang ; Yoon, Yong Hwa ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 587~598
DOI : 10.7465/jkdi.2016.27.3.587
In predicting an outcome of election using a variety of methods ahead of the election, non-response is one of the major issues. Therefore, to address the non-response issue, a variety of methods of non-response imputation may be employed, but the result of forecasting tend to vary according to methods. In this study, in order to improve electoral forecasts, we studied a model based method of non-response imputation attempting to apply the Monte Carlo Expectation Maximization (MCEM) algorithm, introduced by Wei and Tanner (1990). The MCEM algorithm using maximum likelihood estimates (MLEs) is applied to solve the boundary solution problem under the non-ignorable non-response mechanism. We performed the simulation studies to compare estimation performance among MCEM, maximum likelihood estimation, and Bayesian estimation method. The results of simulation studies showed that MCEM method can be a reasonable candidate for non-response model estimation. We also applied MCEM method to the Korean presidential election exit poll data of 2012 and investigated prediction performance using modified within precinct error (MWPE) criterion (Bautista et al., 2007).
Estimation of the case fatality ratio of MERS epidemics using information on patients` severity condition
Hwang, Seonyeong ; Oh, Changhyuck ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 599~607
DOI : 10.7465/jkdi.2016.27.3.599
The first patient of Middle East respiratory syndrome caused by a novel coronavirus infection in Korea was confirmed on May 20, 2015. After that, MERS spread over the country. In recent years, patients of MERS have been found around the Arabian Peninsula and the case fatality ratio of MERS in those area was been reported to range from 30 to 40%. In this paper, we estimate the case fatality ratio of MERS of Korea using data of 186 infections until December 1, 2015. In this study we propose a novel estimator of the case fatality ratio using information of the patients severity condition as well as records on the days of confirmation and death or recovery of the patient. By using publicly available data of the Department of Health and Human Services Centers for Disease Control, we evaluate a performance of the estimator and demonstrate a stability of the estimator from the early stage of the epidemic.
RHadoop platform for K-Means clustering of big data
Shin, Ji Eun ; Oh, Yoon Sik ; Lim, Dong Hoon ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 609~619
DOI : 10.7465/jkdi.2016.27.3.609
RHadoop is a collection of R packages that allow users to manage and analyze data with Hadoop. In this paper, we implement K-Means algorithm based on MapReduce framework with RHadoop to make the clustering method applicable to large scale data. The main idea introduces a combiner as a function of our map output to decrease the amount of data needed to be processed by reducers. We showed that our K-Means algorithm using RHadoop with combiner was faster than regular algorithm without combiner as the size of data set increases. We also implemented Elbow method with MapReduce for finding the optimum number of clusters for K-Means clustering on large dataset. Comparison with our MapReduce implementation of Elbow method and classical kmeans() in R with small data showed similar results.
Categorical time series clustering: Case study of Korean pro-baseball data
Pak, Ro Jin ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 621~627
DOI : 10.7465/jkdi.2016.27.3.621
A certain professional baseball team tends to be very weak against another particular team. For example, S team, the strongest team in Korea, is relatively weak to H team. In this paper, we carried out clustering the Korean baseball teams based on the records against the team S to investigate whether the pattern of the record of the team H is different from those of the other teams. The technique we have employed is `time series clustering`, or more specifically `categorical time series clustering`. Three methods have been considered in this paper: (i) distance based method, (ii) genetic sequencing method and (iii) periodogram method. Each method has its own advantages and disadvantages to handle categorical time series, so that it is recommended to draw conclusion by considering the results from the above three methods altogether in a comprehensive manner.
Generally non-linear regression model containing standardized lift for association number estimation
Park, Hee Chang ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 629~638
DOI : 10.7465/jkdi.2016.27.3.629
Among data mining techniques, the association rule is one of the most used in the real fields because it clearly displays the relationship between two or more items in large databases by quantifying the relationship between the items. There are three primary quality measures for association rule; support, confidence, and lift. We evaluate association rules using these measures. The approach taken in the previous literatures as to estimation of association rule number has been one of a determination function method or a regression modeling approach. In this paper, we proposed a few of non-linear regression equations useful in estimating the number of rules and also evaluated the estimated association rules using the quality measures. Furthermore we assessed their usefulness as compared to conventional regression models using the values of regression coefficients, F statistics, adjusted coefficients of determination and variation inflation factor.
Study on equity of taxation for non-residential property by analysis of actual transaction price
Kim, Hyoung June ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 639~651
DOI : 10.7465/jkdi.2016.27.3.639
"Law on price announcement for real estate" which was revised as of Jan. 19, 2016 (will be enforced as of Sep. 1, 2016) decided the introduction of `Price announcement system for non-residential property` for the first time. However, its introduction seems to be delayed based on two reasons. Firstly the methodology for introduction of non-property system is not definitized, despite many problems were brought up for current tax base of non-residential property. In addition, changes in tax base will place a burden on the government. In this regard, this study analyzed actual transaction price data throughout one year to analyze equity of taxation for non-residential property and to find major factor which affects on the tax base, in relation with the change of current public announcement system to actual transaction based system. And this is the first study that applied actual transaction price to non-residential property.
The estimation of winning rate in Korean professional baseball league
Kim, Soon-Kwi ; Lee, Young-Hoon ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 653~661
DOI : 10.7465/jkdi.2016.27.3.653
In this paper, we provide a suitable optimal exponent in the generalized Pythagorean theorem and propose to use the logistic model & the probit model to estimate the winning rate in Korean professional baseball league. Under a criterion of root-mean-square-error (RMSE), the efficiencies of the proposed models have been compared with those of the Pythagorean theorem. We use the team historic win-loss records of Korean professional baseball league from 1982 to the first half of 2015, and the proposed methods show slight outperformances over the generalized Pythagorean method under the criterion of RMSE.
The winning probability in Korean series of Korean professional baseball
Cho, Daehyeon ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 663~676
DOI : 10.7465/jkdi.2016.27.3.663
In Korean professional baseball the championship team of the year is determined by the four series of games: semi-semi-playoff, semi-playoff, playoff and korean series. To the top 5 teams in a regular season privileges are given to play the games at post season. At semi-semi playoff the winner of two teams which are ranked at 4th and 5th place in the regular season can advance to the game of semi playoff. The winner at semi playoff advances to the playoff to play with the second place team in the regular season. Finally, the championship team is to be determined in the Korean series between the winner of the playoff and the first ranked team in the regular season. We propose methods of how to calculate the winning probabilities of each of high ranked 5 teams advancing to Korean series. From our proposed methods we can estimate the championship probabilities of each of high ranked 5 teams advancing to the Korean series only if we know the winning probabilities between two teams in the regular season or the post season.
Alternative hitting ability index for KBO
Hong, Chong Sun ; Kim, Jae Young ; Shin, Dong Sik ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 677~687
DOI : 10.7465/jkdi.2016.27.3.677
Among lots of sabermetric statistics for baseball batters` ability, the wins above replacement (WAR) is the most popular statistic in MLB. However, there exists a difficulty applying WAR to KBO, since KBO data do not have position adjustment, league adjustment and park factor which are essential in calculating WAR. In this paper, using five statistics for both KBO and MLB qualified batters, we propose hitting ability index (HAI), an alternative sabermetric indices to represent batters` ability. Comparing HAI with WAR of MLB batters, we evaluate the validity of HAI and then applied HAI to 2015 KBO data in which HAI is analyzed statistically with respect to different teams, ages, and positions. Moreover, the linear relationship between KBO batter`s HAI and their annual salary is discussed. Grouping 46 KBO batters based on confidence region of the regression model for annual salary, we also statistically investigate batter`s annual salary in these groups with respect to several factors.
Estimation of the joint conditional distribution for repeatedly measured bivariate cholesterol data using nonparametric copula
Kwak, Minjung ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 689~700
DOI : 10.7465/jkdi.2016.27.3.689
We study estimation and inference of the joint conditional distributions of bivariate longitudinal outcomes using regression models and copulas. For the estimation of marginal models we consider a class of time-varying transformation models and combine the two marginal models using nonparametric empirical copulas. Regression parameters in the transformation model can be obtained as the solution of estimating equations and our models and estimation method can be applied in many situations where the conditional mean-based models are not good enough. Nonparametric copulas combined with time-varying transformation models may allow quite flexible modeling for the joint conditional distributions for bivariate longitudinal data. We apply our method to an epidemiological study of repeatedly measured bivariate cholesterol data.
A study on analysis of packet amount of Naver`s mobile portal
Ryu, Gui-Yeol ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 701~710
DOI : 10.7465/jkdi.2016.27.3.701
The purpose of this paper is to build a model of packet amount of Naver mobile portal. We collected 2004 cases by measuring the sixth per access from September, 2012 to October, 2015. We use regression model with autoregressive errors, in which predictors incorporated into the model were replication, date, time, week, month. It has been found the model which errors follow AR(36), based on AIC and adjusted
. We found some characteristics from our model as follows. In addition to model building, we also have discussed some meaningful features yielded from the selected model in this paper. Considering the importance of this topic, continuous researches are needed.
Bivariate reliability models with multiple dynamic competing risks
Kim, Juyoung ; Cha, Ji Hwan ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 711~724
DOI : 10.7465/jkdi.2016.27.3.711
Under variable complex operating environment, various factors can affect the lifetimes of systems. In this research, we study bivariate reliability models having multiple dynamic competing risks. As competing risks, in addition to the natural failure, we consider the increased stress caused by the failure of one component, external shocks, and the level of stress of the working environment at the same time. Considering two reliability models which take into account all of these competing risks, we derive bivariate life distributions. Furthermore, we compare these two models and also compare the distributions of maximum and minimum statistics in the two models.
A study on demand forecasting for Jeju-bound tourists by travel purpose using seasonal ARIMA-Intervention model
Song, Junmo ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 725~732
DOI : 10.7465/jkdi.2016.27.3.725
This study analyzes the number of Jeju-bound tourists according to travellers` purposes. We classify the travellers` purposes into three categories: "Rest and Sightseeing", "Leisure and Sport", and "Conference and Business". To see an impact of MERS outbreak occurred in May 2015 on the number of tourists, we fit seasonal ARIMA-Intervention model to the monthly arrivals data from January 2005 to March 2016. The estimation results show that the number of tourists for "Leisure and Sport" and "Conference and Business" were significantly affected by MERS outbreak whereas arrivals for "Rest and Sightseeing" were little influenced. Using the fitted models, we predict the number of Jeju-bound tourists.
Application of DNA marker related with marbling score in Hanwoo cow
Lee, Yoonseok ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 733~739
DOI : 10.7465/jkdi.2016.27.3.733
The aim of this study was to evaluate combination of each of g.15532 C>A, g.17924 G>A SNP of FASN gene and beef quality grade of progeny in Hanwoo cow. In order to analyze the SNPs, genomic DNA was obtained from 270 Hanwoo cow and their progeny steer and g.15532 C>A and g.17924 G>A SNP was genotyped using single-based extension. Employing GLM as a statistical model. g.15532 C>A and g.17924 G>A SNP have a significant effect in Hanwoo steer but no significant effect in Hanwoo cow. Combination of each of g.15532 C>A, g.17924 G>A SNP and beef quality grade of progeny have a significant effect on marbling score in Hanwoo cow. Therefore, we suggest that g.15532 C>A and g.17924 G>A SNP contribute to genetic improvement on marbling score in Hanwoo cow.
Estimation of genetic parameter for carcass traits in commercial Hanwoo steer
Lee, Yoonseok ; Lee, Jea Young ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 741~747
DOI : 10.7465/jkdi.2016.27.3.741
The aim of study was to estimate genetic parameter of carcass traits in commercial Hanwoo steer using national animal model for selection of superior bull. Analyzed data (n
The effects of emotional regulation between clinical practice stress and nursing professionalism in nursing students
Jang, Insun ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 749~761
DOI : 10.7465/jkdi.2016.27.3.749
The purpose of this study was to investigate the effects of emotional regulation between clinical practice stress and nursing professionalism in nursing students. Participants were 192 nursing students and data were collected from September to November, 2015. This study has shown that nursing professionalism is negatively associated with clinical practice stress (r
The influences of self-efficacy and attachment on SNS addiction tendency in college students
Ha, Tae Hi ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 763~772
DOI : 10.7465/jkdi.2016.27.3.763
This study was examined the influence of self-efficacy and attachment on SNS addiction tendency in college students. For this purpose 303 college students in Daegu completed the related survey. Data were collected from September 30 to October 20, 2013. Collected data were analyzed using SPSS 20.0. The major findings were as follow; 1) There were negative relationships among SNS addiction tendency and self-efficacy. 2) There were positive relationships among SNS addiction tendency and adult attachment. These results indicate that it is necessary to design intervention programs to increase self-efficacy and attachment stability in order to decrease college students` SNS addiction tendency.
Sleep patterns and it`s influencing factors of hospitalized elderly in long-term care hospital
Jang, Hyo-Yoel ; Kim, Tae-Im ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 773~789
DOI : 10.7465/jkdi.2016.27.3.773
This study was conducted to identify the sleep patterns and influencing factors of hospitalized elderly in a long-term care hospital. The sleep patterns of 142 subjects were recorded using Sleep Charts. The average sleep time of subjects was 10.7 hours a day (3.9 hours in daytime and 6.8 hours in nighttime). Sleep regularity among participants were 71.7% in all day (58.1% in day time and 80.5% in night time). The presence of dementia patients in the room (PDPR) has been identified to be a statistically significant predictor of all day sleep, and pain, PDPR, and physical function have been found to be a significant predictors of sleep regularity in all day among subjects. It suggested that elderly patients in a long-term care hospital do not slept well during night, which leads to increase in daytime sleep and decrease the quality of their sleep. Therefore, an intervention program should be developed to promote the quality of sleep among hospitalized elderly.
Factors influencing the intent to return to practice (work) of inactive RNs
Hwang, Nami ; Jang, Insun ; Park, Eunjun ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 791~801
DOI : 10.7465/jkdi.2016.27.3.791
The purpose of this study is to examine factors affecting the intent of re-employment of inactive registered nurses. This study presents a secondary analysis of data collected in `Nurse Turnover On-line Survey` by Korean Nurses Association and Korea Institute for Health and Social Affairs in 2014. The analysis shows that 70.9% of inactive RNs has an intent to return to practice, and most of them preferred `flexible working options` (47.8%) or `fixed day shifts` (43.3%) as a work pattern. Main reasons for resigning from their last job have been found to be `high work intensity` (18.8%) and `difficulties of night shifts` (16.7%). Inactive married RNs who have working histories in a general hospital or a long-term care hospital or have preferences for traditional shift works showed a stronger intent to return to practice than their reference group. Our study shows that, for inactive RNs to return to practice, it is recommendable to adopt various non-traditional working patterns, to make a staffing distribution considering the labor intensity and to develop education programs designed to increase RNs` professional satisfaction.
Spatio-temporal models for generating a map of high resolution NO2 level
Yoon, Sanghoo ; Kim, Mingyu ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 803~814
DOI : 10.7465/jkdi.2016.27.3.803
Recent times have seen an exponential increase in the amount of spatial data, which is in many cases associated with temporal data. Recent advances in computer technology and computation of hierarchical Bayesian models have enabled to analyze complex spatio-temporal data. Our work aims at modeling data of daily average nitrogen dioxide (NO2) levels obtained from 25 air monitoring sites in Seoul between 2003 and 2010. We considered an independent Gaussian process model and an auto-regressive model and carried out estimation within a hierarchical Bayesian framework with Markov chain Monte Carlo techniques. A Gaussian predictive process approximation has shown the better prediction performance rather than a Hierarchical auto-regressive model for the illustrative NO2 concentration levels at any unmonitored location.
Modeling of random effects covariance matrix in marginalized random effects models
Lee, Keunbaik ; Kim, Seolhwa ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 815~825
DOI : 10.7465/jkdi.2016.27.3.815
Marginalized random effects models (MREMs) are often used to analyze longitudinal categorical data. The models permit direct estimation of marginal mean parameters and specify the serial correlation of longitudinal categorical data via the random effects. However, it is not easy to estimate the random effects covariance matrix in the MREMs because the matrix is high-dimensional and must be positive-definite. To solve these restrictions, we introduce two modeling approaches of the random effects covariance matrix: partial autocorrelation and the modified Cholesky decomposition. These proposed methods are illustrated with the real data from Korean genomic epidemiology study.
Deep LS-SVM for regression
Hwang, Changha ; Shim, Jooyong ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 827~833
DOI : 10.7465/jkdi.2016.27.3.827
In this paper, we propose a deep least squares support vector machine (LS-SVM) for regression problems, which consists of the input layer and the hidden layer. In the hidden layer, LS-SVMs are trained with the original input variables and the perturbed responses. For the final output, the main LS-SVM is trained with the outputs from LS-SVMs of the hidden layer as input variables and the original responses. In contrast to the multilayer neural network (MNN), LS-SVMs in the deep LS-SVM are trained to minimize the penalized objective function. Thus, the learning dynamics of the deep LS-SVM are entirely different from MNN in which all weights and biases are trained to minimize one final error function. When compared to MNN approaches, the deep LS-SVM does not make use of any combination weights, but trains all LS-SVMs in the architecture. Experimental results from real datasets illustrate that the deep LS-SVM significantly outperforms state of the art machine learning methods on regression problems.
Bayesian testing for the homogeneity of the shape parameters of several inverse Gaussian distributions
Lee, Woo Dong ; Kim, Dal Ho ; Kang, Sang Gil ;
Journal of the Korean Data and Information Science Society, volume 27, issue 3, 2016, Pages 835~844
DOI : 10.7465/jkdi.2016.27.3.835
We develop the testing procedures about the homogeneity of the shape parameters of several inverse Gaussian distributions in our paper. We propose default Bayesian testing procedures for the shape parameters under the reference priors. The Bayes factor based on the proper priors gives the successful results for Bayesian hypothesis testing. For the case of the lack of information, the noninformative priors such as Jereys` prior or the reference prior can be used. Jereys` prior or the reference prior involves the undefined constants in the computation of the Bayes factors. Therefore under the reference priors, we develop the Bayesian testing procedures with the intrinsic Bayes factors and the fractional Bayes factor. Simulation study for the performance of the developed testing procedures is given, and an example for illustration is given.