Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 26, Issue 6 - Nov 2015
Volume 26, Issue 5 - Sep 2015
Volume 26, Issue 4 - Jul 2015
Volume 26, Issue 3 - May 2015
Volume 26, Issue 2 - Mar 2015
Volume 26, Issue 1 - Jan 2015
Selecting the target year
A study on tuning parameter selection for MDPDE
Yu, Donghyeon ; Kim, Byungsoo ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 549~559
DOI : 10.7465/jkdi.2015.26.3.549
The MDPDE is an attractive alternative to maximum likelihood estimator because of the strong robustness properties that it inherently possess. The characteristics of MDPDE can be varied with the tuning parameter, in general, there is a trade-off between robustness and asymptotic efficiency. Hence, selection of optimal tuning parameter is important but complicated task. In this study, we introduce two optimal tuning parameter selection methods proposed by Fujisawa and Eguchi (2005) and Warwick (2006). Through simulation study, we found out that Warwick's method yields excessively small optimal tuning parameter in certain cases while Fujisawa and Eguchi's method performs well. Therefore, we think Fujisawa and Eguchi's method can be used commonly for finding optimal tuning parameter of MDPDE.
Prediction of apartment prices per unit in Daegu-Gyeongbuk areas by spatial regression models
Lee, Woo Jung ; Park, Cheolyong ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 561~568
DOI : 10.7465/jkdi.2015.26.3.561
In this study we predict apartment prices per unit in Daegu-Gyeongbuk areas by spatial lag and spatial error models, both of which belong to so-called spatial regression model. A spatial weight matrix is constructed by k-nearest neighbours method and then the models for the apartment prices in March, 2012 are fitted using the weight matrix. The apartment prices in March, 2013 are predicted by the fitted spatial regression models and then performances of two spatial regression models are compared by RMSE (root mean squared error), RRMSE (root relative mean squared error), MAE (mean absolute error).
A comparison of the statistical methods for testing the equality of crossing survival functions
Lee, Youn Ju ; Lee, Jae Won ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 569~580
DOI : 10.7465/jkdi.2015.26.3.569
Log-rank is widely used for testing equality of two survival functions, and this method is efficient only under the proportional hazard assumption. However, crossing survival functions are common in practice. Therefore, many approaches have been suggested to test equality of them. This study considered several methods; Renyi type test, modified Kolmogorov-Smirnov and Cramer-von Mises test, and weighted Log-rank test, which can be applied when the survival functions cross, and simulated power of those methods. Based on the simulation results, we provide the useful information to choose a suitable approach in a given situation.
Reclassification of the vulnerability group of wartime equipment
Lee, Hanwoo ; Kim, Suhwan ; Joo, Kyungsik ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 581~592
DOI : 10.7465/jkdi.2015.26.3.581
In the GORRAM, the estimation of resource requirements for wartime equipment is based on the ELCON of the USA. The number of vulnerability groups of ELCON are 22, but unfortunately it is hard to determine how the 22 groups are classified. Thus, in this research we collected 505 types of basic items used in wartime and classified those items into new vulnerability groups using AHP and cluster analysis methods. We selected 11 variables through AHP to classify those items with cluster analysis. Next, we decided the number of vulnerability groups through hierarchical clustering and then we classified 505 types of basic items into the new vulnerability groups through K-means clustering.This paper presents new vulnerability groups of 505 types of basic items fitted to Korean weapon systems. Furthermore, our approach can be applied to a new weapon system which needs to be classified into a vulnerability group. We believe that our approach will provide practitioners in the military with a reliable and rational method for classifying wartime equipment and thus consequentially predict the exact estimation of resource requirements in wartime.
Performance comparison of random number generators based on Adaptive Rejection Sampling
Kim, Hyotae ; Jo, Seongil ; Choi, Taeryon ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 593~610
DOI : 10.7465/jkdi.2015.26.3.593
Adaptive Rejection Sampling (ARS) method is a well-known random number generator to acquire a random sample from a probability distribution, and has the advantage of improving the proposal distribution during the sampling procedures, which update it closer to the target distribution. However, the use of ARS is limited since it can be used only for the target distribution in the form of the log-concave function, and thus various methods have been proposed to overcome such a limitation of ARS. In this paper, we attempt to compare five random number generators based on ARS in terms of adequacy and efficiency. Based on empirical analysis using simulations, we discuss their results and make a comparison of five ARS-based methods.
Proposition of balanced comparative confidence considering all available diagnostic tools
Park, Hee Chang ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 611~618
DOI : 10.7465/jkdi.2015.26.3.611
By Wikipedia, big data is a broad term for data sets so large or complex that traditional data processing applications are inadequate. Data mining is the computational process of discovering patterns in huge data sets involving methods at the intersection of association rule, decision tree, clustering, artificial intelligence, machine learning. Association rule is a well researched method for discovering interesting relationships between itemsets in huge databases and has been applied in various fields. There are positive, negative, and inverse association rules according to the direction of association. If you want to set the evaluation criteria of association rule, it may be desirable to consider three types of association rules at the same time. To this end, we proposed a balanced comparative confidence considering sensitivity, specificity, false positive, and false negative, checked the conditions for association threshold by Piatetsky-Shapiro, and compared it with comparative confidence and inversely comparative confidence through a few experiments.
Performance analysis of volleyball games using the social network and text mining techniques
Kang, Byounguk ; Huh, Mankyu ; Choi, Seungbae ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 619~630
DOI : 10.7465/jkdi.2015.26.3.619
The purpose of this study is to provide basic information to develop a game strategy plan of a team in a future by identifying the patterns of attack and pass of national men's professional volleyball teams and extracting core key words related with volleyball game performance to evaluate game performance using 'social network analysis' and 'text mining'. As for the analysis result of 'social network analysis' with the whole data, group '0' (6 players) and group '1' (11 players) were partitioned. A point of view the degree centrality and betweenness centrality in 'social network analysis' results, we can know that the group '1' more active game performance than the group '0'. The significant result for two group (win and loss) obtained by 'text mining' according to two groups ('0' and '1') obtained by 'social network analysis' showed significant difference (p-value: 0.001). As for clustering of each network, group '0' had the tendency to score points through set player D and E. In group '1', the player K had the tendency to fail if he attack through 'dig'; players C and D have a good performance through 'set' play.
Influence of tuition and scholarship on the stop-out rate: An empirical analysis using panel regression model
Yang, Hoseok ; Choi, Jae-Seok ; Han, Jun-Tae ; Jeong, Jina ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 631~638
DOI : 10.7465/jkdi.2015.26.3.631
In this paper, we configured the panel data of four years using the information of Higher Education In Korea (2010~2013) and studied the influence of tuition and scholarships on the stop-out rate of national university and private university separately through a panel analysis. Three models are implemented considering various variables suck as faculty-student ratios, employment rate, per pupil expenditure, average tuition, and per pupil scholarship. This study showed that the lower net tuition and the higher per pupil off-campus scholarship lowered the stop-out rate at national universities and the lower tuition and the higher per pupil scholarship lowered the stop-out rate at private universities.
Multivariate process control procedure using a decision tree learning technique
Jung, Kwang Young ; Lee, Jaeheon ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 639~652
DOI : 10.7465/jkdi.2015.26.3.639
In today's manufacturing environment, the process data can be easily measured and transferred to a computer for analysis in a real-time mode. As a result, it is possible to monitor several correlated quality variables simultaneously. Various multivariate statistical process control (MSPC) procedures have been presented to detect an out-of-control event. Although the classical MSPC procedures give the out-of-control signal, it is difficult to determine which variable has caused the signal. In order to solve this problem, data mining and machine learning techniques can be considered. In this paper, we applied the technique of decision tree learning to the MSPC, and we did simulation for MSPC procedures to monitor the bivariate normal process means. The results of simulation show that the overall performance of the MSPC procedure using decision tree learning technique is similar for several values of correlation coefficient, and the accurate classification rates for out-of-control are different depending on the values of correlation coefficient and the shift magnitude. The introduced procedure has the advantage that it provides the information about assignable causes, which can be required by practitioners.
Measuring the accuracy of the Pythagorean theorem in Korean pro-baseball
Lee, Jangtaek ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 653~659
DOI : 10.7465/jkdi.2015.26.3.653
The Pythagorean formula for baseball postulated by James (1982) indicates the winning percentage as a function of runs scored and runs allowed. However sometimes, the Pythagorean formula gives a less accurate estimate of winning percentage. We use the records of team vs team historic win loss records of Korean professional baseball clubs season from 2005 and 2014. Using assumption that the difference between winning percentage and pythagorean expectation are affected by unusual distribution of runs scored and allowed, we suppose that difference depends on mean, standard deviation, and coefficient of variation of runs scored per game and runs allowed per game, respectively. In conclusion, the discrepancy is mainly related to the coefficient of variation and standard deviation for run allowed per game regardless of run scored per game.
The effect of road weather factors on traffic accident - Focused on Busan area -
Lee, Kyeongjun ; Jung, Imgook ; Noh, Yunhwan ; Yoon, Sanggyeong ; Cho, Youngseuk ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 661~668
DOI : 10.7465/jkdi.2015.26.3.661
Them traffic accidents have been increased every year due to increasing of vehicles numbers as well as the gravitation of the population. The carelessness of drivers, many road weather factors have a great influence on the traffic accidents. Especially, the number of traffic accident is governed by precipitation, visibility, humidity, cloud amounts and temperature. The purpose of this paper is to analyse the effect of road weather factors on traffic accident. We use the data of traffic accident, AWS weather factors (precipitation, existence of rainfall, temperature, wind speed), time zone and day of the week in 2013. We did statistical analysis using logistic regression analysis and decision tree analysis. These prediction models may be used to predict the traffic accident according to the weather condition.
Effect of different rearing systems on cortisol level and fatty acid composition in M-Longissimus of Korean native steers
Ha, Jae Jung ; Oh, Dong Yep ; Yi, Jun Koo ; Lee, Jae-Young ; Lee, Ji Hong ; Park, Young Sik ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 669~675
DOI : 10.7465/jkdi.2015.26.3.669
This study was carried out to elucidate the effect of different rearing system on cortisol level, stress hormone, and fatty acid composition in the edible muscle tissues. These steers were reared in two different systems including antibiotic-free (ARS) and conservative system (CRS). In the M-Longissimus tissue, cortisol level was significantly lower in ARS than CRS, (p=0.0176). But, the levels of total saturated and unsaturated-fatty acids does not differ in ARS as CRS (p >0.05). However, the total saturated fatty acid levels tended to be greater in CRS and the total unsaturated fatty acid levels tended to be greater in ARS. However, the level of n-6 unsaturated fatty acid was higher in ARS than CRS (p=0.004). Especially, levels of linoleic acid (LA) and
-linolenic acid (GLA) were significantly higher in ARS (p <0.01). Cortisol level and the n-6 fatty acid content in muscle tissue were negatively correlated (at p=0.00140.) In conclusion, ARS may produce beef with higher quality which contains lower cortisol and greater n-6 fatty acids, such as ALA and GLA.
Major gene identification for SREBPs and FABP4 gene which are associated with fatty acid composition of Korean cattle
Lee, Jae-Young ; Jang, Ji-Eun ; Oh, Dong-Yep ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 677~685
DOI : 10.7465/jkdi.2015.26.3.677
Disease of human and economic traits of livestocks are affected a lot by gene combination effect rather than a single gene effect. In this study, we used SNPHarvester method that supplement existing method in order to investigate the interaction of these genes. The used genes are SREBPs (g.3270+10274 C>T, g.13544 T>C) and FABP4 (g.2634+1018 A>T, g.2988 A>G, g.3690 G>A, g.3710 G>C, g.3977-325 T>C, g.4221 A>G) that are closely related to the fatty acid composition affecting the meatiness of Korean cattle. The economic traits which are used are oleic acid (C18:1), monounsaturated fatty acid (MUFA), marbling score (MS). First, we have utilized the SNPHarvester method in order to find excellent gene combination, and then used the multifactor dimensionality reduction method in order to identify excellent genotype in gene combination.
Environmental factors influencing acetone and Environmental factors influencing acetone and β-hydroxybutyrate acid contents in raw milk of Holstein dairy cattle
Cho, Kwang-Hyun ; Cho, Chung-Il ; Lee, Joon-Ho ; Park, Kyung-Do ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 687~693
DOI : 10.7465/jkdi.2015.26.3.687
Using 378,086 lactation records on dairy cattle, environmental factors influencing acetone and
-hydroxybutyrate acid contents in raw milk which are used as ketosis diagnosis indicator traits were analyzed in this experiment. Significance testing was conducted on farm, lactation stage, parity, milking time and month of age by traits. The results of this experiment indicated that there was a highly significant (p < 0.01) difference in all factors and lactation stage was the most significant factor. Linear regression coefficients of month of age on daily milk yields and acetone and
-hydroxybutyrate acid contents were all positive, while their quadratic linear regression coefficients were negative. Least square means for milk yield at second lactation stage (36~65 days) was 19.06kg which was higher than that of late lactation stage by 6.51kg. Least square means for acetone and
-hydroxybutyrate acid contents at the first lactation stage (5~35 days) were highest (0.1929mM/L and 0.0742mM/L, respectively), and there was a trend that they decreased as the milking progressed, but increased slightly at the late stage of milking. However, least square means for acetone and
-hydroxybutyrate acid contents at the first parity were 0.1414mM/L and 0.0522mM/L, respectively, which were higher than the average milk yield after the second parity. Least square means for acetone and
-hydroxybutyrate acid contents of PM milk yield (0.1372mM/L and 0.0534mM/L, respectively) were higher than those of AM milk yield collectively.
Self-leadership, critical thinking disposition, satisfaction of clinical practice and clinical practice competency of nursing students
Park, Hyeon-Sook ; Han, Ji-Young ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 695~706
DOI : 10.7465/jkdi.2015.26.3.695
The purpose of this study was to examine the relationship among self-leadership, critical thinking disposition, satisfaction of clinical practice and clinical practice competency of nursing students. Participants were 199 baccalaureate nursing students (3rd and 4th grades) in 2 cities. The data was collected by questionnaires and were analyzed with the SPSS/Win 21.0 program, using descriptive statistics, Pearson's correlation coefficient and multiple regression. Significant positive correlations were among self-leadership, critical thinking disposition, satisfaction of clinical practice and clinical practice competency. The regression model explained 30.4% of satisfaction of clinical practice. The significant predictors of satisfaction of clinical practice were clinical experience, satisfaction of major, self-leadership and critical thinking disposition. The regression model explained 23.7% of clinical practice competency. Health status, self-leadership and critical thinking disposition were factors influencing clinical practice competency. It should strengthen self-leadership and encourage critical thinking disposition to improve nursing students' satisfaction of clinical practice and clinical practice competency.
Development and validation of an instrument to assess quality of life for end stage renal disease
Kim, Sookhyun ; Kim, Yong-Lim ; Park, Ki-Soo ; Kam, Sin ; Lee, Won Kee ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 707~714
DOI : 10.7465/jkdi.2015.26.3.707
The SF-36 is the most common instrument to check the quality of life for dialysis patients with chronic renal failure. However, there were too much burden for them to answer 36 items of it. So we purposed to develop the RFQoL-K reduced type of the SF-36. Participants who had newly registered for dialysis were enrolled in 29 medical centers during 45months from 2009. We developed the RFQoL-K through 355 people who applied the SF-36 at 3 and 12 months after registration and then checked it's internal validity. External validity about it was checked via 411 people who answered only one time survey after registration. In conclusion, the RFQoL-K had total 14 items which was consisted of 8 items on physical factors and 6 items on mental factors from the SF-36. The RFQoL-K summary scores explained 91-93% of the SF-36 summary scores. The RFQoL-K was well reflected SF-36 because the correlation and the internal consistency between two tools were very high 0.96 to 0.98 and 0.96 to 0.98 respectively.
Nursing outcomes of inpatient on level of nursing staffing in long term care hospitals
Kim, Eun Hee ; Lee, Eunjoo ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 715~727
DOI : 10.7465/jkdi.2015.26.3.715
This study was conducted to explore the impact of nursing staffing on inpatient nursing outcomes in long term care hospitals. A secondary analysis was done of national data from the Health Insurance Review and Assessment Services including evaluation of long term care hospitals. Patients per RN was a significant indicator of foley catheter ratio in high risk group and low risk group. Patients per RN&NA was a significant indicator of decline in ADL for patients with dementia, non dementia, urinary incontinence and new pressure ulcer development in the high risk group. The average nursing outcome of inpatient in high grade was higher than that low grade in long care hospital. This higher level of nursing staffing and the higher the grade shown a positive effect on the nursing outcomes of the inpatient. We therefore recommend modifying the above nurse staffing policy so as to make it more effective in improving nursing outcomes.
A comparison of single charts for non-normal data
Kang, Myunggoo ; Lee, Jangtaek ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 729~738
DOI : 10.7465/jkdi.2015.26.3.729
In this paper, we compare the robustness to the assumption of normality of the single control charts to control the mean and variance simultaneously. The charts examined were semicircle control chart, max chart and MSE chart with Shewhart individuals control charts. Their in-control and out-of-control performance were studied by simulation combined with computation. We calculated false alarm rate to compare among single charts by changing subgroup size and shifting mean of quality characteristics. It turns out that max chart is more robust than any of the others if the process is in-control. In some cases max chart and MSE chart are more robust than others if the process is out-of-control.
Default Bayesian testing equality of scale parameters in several inverse Gaussian distributions
Kang, Sang Gil ; Kim, Dal Ho ; Lee, Woo Dong ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 739~748
DOI : 10.7465/jkdi.2015.26.3.739
This paper deals with the problem of testing about the equality of the scale parameters in several inverse Gaussian distributions. We propose default Bayesian testing procedures for the equality of the shape parameters under the reference priors. The reference prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. Therefore we propose the default Bayesian testing procedures based on the fractional Bayes factor and the intrinsic Bayes factors under the reference priors. Simulation study and an example are provided.
Bayesian curve-fitting with radial basis functions under functional measurement error model
Hwang, Jinseub ; Kim, Dal Ho ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 749~754
DOI : 10.7465/jkdi.2015.26.3.749
This article presents Bayesian approach to regression splines with knots on a grid of equally spaced sample quantiles of the independent variables under functional measurement error model.We consider small area model by using penalized splines of non-linear pattern. Specifically, in a basis functions of the regression spline, we use radial basis functions. To fit the model and estimate parameters we suggest a hierarchical Bayesian framework using Markov Chain Monte Carlo methodology. Furthermore, we illustrate the method in an application data. We check the convergence by a potential scale reduction factor and we use the posterior predictive p-value and the mean logarithmic conditional predictive ordinate to compar models.
Bayesian estimation of median household income for small areas with some longitudinal pattern
Lee, Jayoun ; Kim, Dal Ho ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 755~762
DOI : 10.7465/jkdi.2015.26.3.755
One of the main objectives of the U.S. Census Bureau is the proper estimation of median household income for small areas. These estimates have an important role in the formulation of various governmental decisions and policies. Since direct survey estimates are available annually for each state or county, it is desirable to exploit the longitudinal trend in income observations in the estimation procedure. In this study, we consider Fay-Herriot type small area models which include time-specific random effect to accommodate any unspecified time varying income pattern. Analysis is carried out in a hierarchical Bayesian framework using Markov chain Monte Carlo methodology. We have evaluated our estimates by comparing those with the corresponding census estimates of 1999 using some commonly used comparison measures. It turns out that among three types of time-specific random effects the small area model with a time series random walk component provides estimates which are superior to both direct estimates and the Census Bureau estimates.
Noninformative priors for product of exponential means
Kang, Sang Gil ; Kim, Dal Ho ; Lee, Woo Dong ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 763~772
DOI : 10.7465/jkdi.2015.26.3.763
In this paper, we develop the noninformative priors for the product of different powers of k means in the exponential distribution. We developed the first and second order matching priors. It turns out that the second order matching prior matches the alternative coverage probabilities, and is the highest posterior density matching prior. Also we revealed that the derived reference prior is the second order matching prior, and Jeffreys' prior and reference prior are the same. We showed that the proposed reference prior matches very well the target coverage probabilities in a frequentist sense through simulation study, and an example based on real data is given.
Statistical analysis of KNHANES data with measurement error models
Hwang, Jinseub ;
Journal of the Korean Data and Information Science Society, volume 26, issue 3, 2015, Pages 773~779
DOI : 10.7465/jkdi.2015.26.3.773
We study a statistical analysis about the fifth wave data of the Korea National Health and Nutrition Examination Survey based on linear regression models with measurement errors. The data is obtained from a national population-based complex survey. To demonstrate the availability of measurement error models, two results between the general linear regression model and measurement error model are compared based on the model selection criteria which are Akaike information criterion and Bayesian information criterion. For our study, we use the simulation extrapolation algorithm for measurement error model and the jackknife method for the estimation of standard errors.