Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 27, Issue 4 - Jul 2016
Volume 27, Issue 3 - May 2016
Volume 27, Issue 2 - Mar 2016
Volume 27, Issue 1 - Jan 2016
Selecting the target year
Classification of ratings in online reviews
Choi, Dongjun ; Choi, Hosik ; Park, Changyi ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 845~854
DOI : 10.7465/jkdi.2016.27.4.845
Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.
A simple diagnostic statistic for determining the size of random forest
Park, Cheolyong ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 855~863
DOI : 10.7465/jkdi.2016.27.4.855
In this study, a simple diagnostic statistic for determining the size of random forest is proposed. This method is based on MV (margin of victory), a scaled difference in the votes at the infinite forest between the first and second most popular categories of the current random forest. We can note that if MV is negative then there is discrepancy between the current and infinite forests. More precisely, our method is based on the proportion of cases that -MV is greater than a fixed small positive number (say, 0.03). We derive an appropriate diagnostic statistic for our method and then calculate the distribution of the statistic. A simulation study is performed to compare our method with a recently proposed diagnostic statistic.
The effect of dietary addition of herbal probiotics for the production of high quality Hanwoo
Kim, Byung Ki ; Ha, Jae Jung ; Yi, Jun Koo ; Oh, Dong Yep ; Jung, Dae Jin ; Hwang, Eun Gyeoung ; Lee, Jea Young ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 865~874
DOI : 10.7465/jkdi.2016.27.4.865
This study was carried out to investigate the effect of dietary addition of herbal probiotics on the Hanwoo steers` physiochemical property. A total of 50 Hanwoo steers (5 treatment groups
10 heads) were used. The crude fat content of beef has been found significant high in T2 and T3 group, and the Con 2 group had the highest heating loss (p<0.05). The water-holding capacity ranged from 56.73% through 60.16%, and the treatment group was generally higher than the control group. In particular, the T3 group showed significantly high water-holding capacity (p<0.05). The cholesterol content ranged from 41.64mg/100g through 47.33mg/100g. In the overall and the Con 2 group had significantly high cholesterol content (p<0.05). Furthermore, the oleic acid and MUFA had significant high T2 and T3 group in the fatty acid composition (p<0.05), but the amino acid content made no difference between the treatment groups.
An analysis on the influence of the China government`s software support policy on the revenue of software export
Choi, JeongHo ; Zhang, YongAn ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 875~886
DOI : 10.7465/jkdi.2016.27.4.875
In this study, we investigate an influence of the China government`s software support policy on the revenue of software export. In the analysis in the areas of technology development, manpower development, quality control and marketing reinforcement from 2008 to 2014, it has been found that the amounts of the policy influence and annual revenue of software export increase simultaneously, proving that the China government`s support policy has a close relationship with the software export revenue. However, the annual ratio of the software export revenue to the gross software production revenue has decreased over the period, which indicates that the growth of software industry in China has been mainly driven by domestic market.
A study on the improvement of academic achievement of probability and statistics in the hardware curriculum
Lee, Seung-Woo ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 887~898
DOI : 10.7465/jkdi.2016.27.4.887
The purpose of this study is to improve the learning ability of probability/statistics for H/W majors. Firstly, we developed a teaching method coupling probability/statistics with programming and multimedia signal processing courses that are opened in the H/W major curriculum. By use of its teaching-learning, we tried to verify the effectiveness on the improvement of learner`s academic achievement and then analyze its educational efficiency through the regression analysis. Secondly, by analyzing the surveys and the statistical results of the education cases, we proposed a management plan on efficient teaching-learning in order to cultivate the learning ability of probability/statistics at a future time. Lastly, we concluded that probability/statistics is a required course of learners so as to contribute for the advanced technical development and the enhanced competitiveness in the field of the H/W.
Permutation p-values for specific-category kappa measure of agreement
Um, Yonghwan ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 899~910
DOI : 10.7465/jkdi.2016.27.4.899
Asymptotic tests are often not suitable for the analysis of sparse ordered contingency tables as asymptotic p-values may either overestimate or underestimate the true pvalues. In this pater, we describe permutation procedures in which we compute exact or resampling p-values for a weighted specific-category agreement in ordered
contingency tables. We use the weighted specific-category kappa proposed by
to measure the extent to which two independent raters agree on the specific categories. We carried out comparison studies between exact p-values, resampling p-values and asymptotic p-values using
contingency data (real and artificial data sets) and
artificial contingency data.
Learning algorithms for big data logistic regression on RHIPE platform
Jung, Byung Ho ; Lim, Dong Hoon ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 911~923
DOI : 10.7465/jkdi.2016.27.4.911
Machine learning becomes increasingly important in the big data era. Logistic regression is a type of classification in machine leaning, and has been widely used in various fields, including medicine, economics, marketing, and social sciences. Rhipe that integrates R and Hadoop environment, has not been discussed by many researchers owing to the difficulty of its installation and MapReduce implementation. In this paper, we present the MapReduce implementation of Gradient Descent algorithm and Newton-Raphson algorithm for logistic regression using Rhipe. The Newton-Raphson algorithm does not require a learning rate, while Gradient Descent algorithm needs to manually pick a learning rate. We choose the learning rate by performing the mixed procedure of grid search and binary search for processing big data efficiently. In the performance study, our Newton-Raphson algorithm outpeforms Gradient Descent algorithm in all the tested data.
The study on the relevance of life management and sub-health
Shin, Jae-Kyoung ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 925~934
DOI : 10.7465/jkdi.2016.27.4.925
As we enter the 21st century, interests in health and quality of life have grown gradually. In this study, we analyzed the data in response to each questionnaire for life management and sub-health among targeted members of a particular group. The results of the analysis of life management have found no difference between genders at the 5% of significance level. In respect to gender, a differential analysis of sub-health, however, has shown a gender difference in which female students had significantly worse health conditions than male students in the areas of immune system, intestine, cerebral nerve, hormone, and urinary system. Moreover, we also have found no significant difference among colleges in terms of life management and sub-health. In conclusion, it was shown that sub-health is closely related with life management.
The wage determinants of the vocational high school graduates using mixed effects mode
Ryu, Jangsoo ; Cho, Jangsik ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 935~946
DOI : 10.7465/jkdi.2016.27.4.935
In this paper, we analyzed wage determinants of the vocational high school graduates utilizing both individual-level and work region-level variables. We formulate the models in the way wage determination has multi-level structure in the sense that individual wage is influenced by individual-level variables (level-1) and work region-level (level-2) variables. To incorporate dependency between individual wages into the model, we utilize hierarchical linear model (HLM). The major results are as follows. First, it is shown that the HLM model is better than the OLS regression models which do not take level-1 and level-2 variables simultaneously into account. Second, random effects on sex, maester dummy and engineering dummy variables are statistically significant. Third, the fixed effects on business hours and mean wage of regular job for level-2 variables are statistically significant effect individual-level wages. Finally, parental education level, parental income, number of licenses and high school grade are statistically significant for higher individual-level wages.
A classification of the journals in KCI using network clustering methods
Kim, Jinkwang ; Kim, Sohyung ; Oh, Changhyuck ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 947~957
DOI : 10.7465/jkdi.2016.27.4.947
KCI is a database for the citations of journals and papers published in Korea. Classification of a journal listed in KCI was mainly determined by the publisher who registered the journal at the time of application for the journal. However, journal classification in KCI was known for not properly representing the quoting rate between journals. In this study, we extracted communities of the journals registerd in KCI based on quoting relationship using various network clustering algorithms. Among them, the infomap algorithm turned out to give a classification more being alike to the current KCI`s in the aspect of the modular structure.
Saddlepoint approximations for the risk measures of linear portfolios based on generalized hyperbolic distributions
Na, Jonghwa ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 959~967
DOI : 10.7465/jkdi.2016.27.4.959
Distributional assumptions on equity returns play a key role in valuation theories for derivative securities. Elberlein and Keller (1995) investigated the distributional form of compound returns and found that some of standard assumptions can not be justified. Instead, Generalized Hyperbolic (GH) distribution fit the empirical returns with high accuracy. Hu and Kercheval (2007) also show that the normal distribution leads to VaR (Value at Risk) estimate that significantly underestimate the realized empirical values, while the GH distributions do not. We consider saddlepoint approximations to estimate the VaR and the ES (Expected Shortfall) which frequently encountered in finance and insurance as measures of risk management. We supposed GH distributions instead of normal ones, as underlying distribution of linear portfolios. Simulation results show the saddlepoint approximations are very accurate than normal ones.
Garlic yields estimation using climate data
Choi, Sungchun ; Baek, Jangsun ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 969~977
DOI : 10.7465/jkdi.2016.27.4.969
Climate change affects the growth of crops which were planted especially in fields, and it becomes more important to use climate data to predict the yields of the major vagetables. The variation of the crop products caused by climate change is one of the significant factors for the discrepancy of the demand and supply, and leads to the price instability. In this paper, using a panel regression model, we predicted the garlic yields with the weather conditions of different regions. More specifically we used the panel data of the several climate variables for 15 main garlic production areas from 2006 to 2015. Seven variables (average temperature, average maximum temperature, average minimum temperature, average surface temperature, cumulative precipitation, average relative humidity, cumulative duration time of sunshine) for each month were considered, and most significant 7 variables were selected from the total 84 variables by the stepwise regression. The random effects model was chosen by the Hausman test. The average maximum temperature (January), the cumulative precipitation (March, October), the cumulative duration time of sunshine (April, October) were chosen among the variables as the significant climate variables of the model
Dynamic ontology construction algorithm from Wikipedia and its application toward real-time nation image analysis
Lee, Youngwhan ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 979~991
DOI : 10.7465/jkdi.2016.27.4.979
Measuring nation images was a challenging task when employing offline surveys was the only option. It was not only prohibitively expensive, but too much time-consuming and therefore unfitted to this rapidly changing world. Although demands for monitoring real-time nation images were ever-increasing, an affordable and reliable solution to measure nation images has not been available up to this date. The researcher in this study developed a semi-automatic ontology construction algorithm, named "double-crossing double keyword collection (or DCDKC)" to measure nation images from Wikipedia in real-time. The ontology, WikiOnto, can be used to reflect dynamic image changes. In this study, an instance of WikiOnto was constructed by applying the algorithm to the big-three exporting countries in East Asia, Korea, Japan, and China. Then, the numbers of page views for words in the instance of WikiOnto were counted. A collection of the counting for each country was compared to each other to inspect the possibility to use for dynamic nation images. As for the conclusion, the result shows how the images of the three countries have changed for the period the study was performed. It confirms that DCDKC can very well be used for a real-time nation-image monitoring system.
Smoothing parameter selection in semi-supervised learning
Seok, Kyungha ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 993~1000
DOI : 10.7465/jkdi.2016.27.4.993
Semi-supervised learning makes it easy to use an unlabeled data in the supervised learning such as classification. Applying the semi-supervised learning on the regression analysis, we propose two methods for a better regression function estimation. The proposed methods have been assumed different marginal densities of independent variables and different smoothing parameters in unlabeled and labeled data. We shows that the overfitted pilot estimator should be used to achieve the fastest convergence rate and unlabeled data may help to improve the convergence rate with well estimated smoothing parameters. We also find the conditions of smoothing parameters to achieve optimal convergence rate.
The estimation of CO concentration in Daegu-Gyeongbuk area using GEV distribution
Ryu, Soorack ; Eom, Eunjin ; Kwon, Taeyong ; Yoon, Sanghoo ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1001~1012
DOI : 10.7465/jkdi.2016.27.4.1001
It is well known that air pollutants exert a bad influence on human health. According to the United Nations Environment Program, 4.3 million people die from carbon monoxide and particulate matter annually from all over the world. Carbon monoxide is a toxic gas that is the most dangerous of the gas consisting of carbon and oxygen. In this paper, we used 1 hour, 6 hours, 12 hours, and 24 hours average carbon monoxide concentration data collected between 2004 and 2013 in Daegu Gyeongbuk area. Parameters of the generalized extreme value distribution were estimated by maximum likelihood estimation and L-moments estimation. An evalution of goodness of fitness also was performed. Since the number of samples were small, L-moment estimation turned out to be suitable for parameter estimation. We also calculated 5 year, 10 year, 20 year, and 40 year return level.
Analyzing longitudinal effect of physical education activity on adolescent self-rated health evaluation changes using hierarchical linear and nonlinear models
Kim, Sae Hyung ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1013~1025
DOI : 10.7465/jkdi.2016.27.4.1013
The purpose of this study was to analyze longitudinal effect of physical education activity (PEA) score on self-rated health evaluation change (SHEC). This study used hierarchical linear and nonlinear models to investigate of the SHEC during the transition into adolescence (from middle school 1st to high school 2nd grade). Using the Korea children and youth panel survey (KCYPA), data were collected over the course of five years (from 2010 and 2014). HLM 6.8 computer program was used to analyze the data. The result were as follows. First, boys` SHEC increased across the five years, and girls` SHEC decreased across the five years. Second, boys` the self-rated health was increased across the three years and decreased across the two years. Third, girls` the self-rated health was increased across the two years and decreased across the three years. Fourth, the PEA score of 1st grade of high school showed a significant positive association with the boys` SHEC. Fifth, the PEA score of 1st grade of middle school showed a significant negative association with the girls` SHEC.
A study on satisfaction for military food services
Kim, Joung Ae ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1027~1033
DOI : 10.7465/jkdi.2016.27.4.1027
The purpose of this study is to examine the satisfaction level for military food services as well as to find out influential factors which affect satisfaction. The questionnaire survey was conducted by 111 soldiers. In the satisfaction analysis for military food services, 69.3% of soldiers presented positive response. There are, however, some differences in satisfaction between corporal and private groups as well as corporal and sergeant groups. Overall, the corporal group shows low satisfaction for military food services. To identify influential factors for satisfaction of military food services, we considered various factors: tasty, nutrition, quantity, hygienic conditions, environment, ambient factors, waiting time, presence of desert, menu, and kindness of a cook. Satisfaction analysis shows that main significant factors are tasty and quantity of the food. Thus, tasty and quantity of the food are need to be consistently monitored and improved to increase the satisfaction of military food services.
The relationship between sense of humor, stress and depression in the nursing students
Lee, Hae Jin ; Ko, Ye Jung ; Han, Seung Woo ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1035~1046
DOI : 10.7465/jkdi.2016.27.4.1035
This study was performed to identify the relationship between sense of humor, stress, and depression in the nursing students. Data were collected from the 20th of June to the 30th of June in 2015 from 227 nursing students in K university. The collected data were analyzed using frequency, percentage, average, standard deviation, independent t-test, one-way ANOVA, kruscal-walis, pearson, spearman, and rank correlation coefficient. The result shows the sense of humor was significantly different between grade (t
The effects of drinking motives, refusal self-efficacy, and outcome expectancy on high risk drinking
Lee, Eun Kyung ; Park, Jin-Hwa ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1047~1057
DOI : 10.7465/jkdi.2016.27.4.1047
The purpose of this study was to examine if high risk drinkers are different from normal drinkers in terms of drinking motives, drinking refusal self-efficacy, and alcohol outcome expectancy. A total of 139 university male students in D area completed a self-reporting questionnaires to assess general characteristics, drinking motives, drinking refusal self-efficacy, alcohol outcome expectancy, and amount of drinking. The subjects were divided into high risk drinking and normal drinking based on a CDC guideline. The results of study show that high risk drinking group has higher odds for current smoking (adjusted OR
Robust varying coefficient model using L1 regularization
Hwang, Changha ; Bae, Jongsik ; Shim, Jooyong ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1059~1066
DOI : 10.7465/jkdi.2016.27.4.1059
In this paper we propose a robust version of varying coefficient models, which is based on the regularized regression with L1 regularization. We use the iteratively reweighted least squares procedure to solve L1 regularized objective function of varying coefficient model in locally weighted regression form. It provides the efficient computation of coefficient function estimates and the variable selection for given value of smoothing variable. We present the generalized cross validation function and Akaike information type criterion for the model selection. Applications of the proposed model are illustrated through the artificial examples and the real example of predicting the effect of the input variables and the smoothing variable on the output.
Run related probability function and their application to start-up demonstration tests
Bi, Yi-Ming ; Oh, Jung-Taek ; Cho, Gyo-Young ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1067~1074
DOI : 10.7465/jkdi.2016.27.4.1067
A start-up demonstration test is a mechanism that is usually used to determine the reliability of equipment, for example water pumps, car batteries and power generators. The simplest and oldest start-up demonstration tests are called CS (consecutive successes) which have been studied by Hahn and Gage (1983), Viveros and Balakrishnan (1993).At first Hahn and Gage (1983) discussed the start-up demonstration test. I was based on i.i.d (independently and identically distributed) binary outcomes with the specified number of consecutive successful start-ups. Oh (2016) studied CSNCF (consecutive successful, but not consecutive failures). In this paper, we investigated the CS and CSNCF models, also their applications to start-up demonstration tests. The numerical results showed that the expectations and variances of the total number of attempted start-ups until the acceptance of the unit are gradually increasing in all of the specified number of successes as the p (probability of a successful start-up in an single trial) decreases from 0.99 to 0.90. The difference between means of the CS mode and CSNCF model is small, but variances of the CS and CSNCF are big.
Comparison study of SARIMA and ARGO models for in influenza epidemics prediction
Jung, Jihoon ; Lee, Sangyeol ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1075~1081
DOI : 10.7465/jkdi.2016.27.4.1075
The big data analysis has received much attention from the researchers working in various fields because the big data has a great potential in detecting or predicting future events such as epidemic outbreaks and changes in stock prices. Reflecting the current popularity of big data analysis, many authors have proposed methods tracking influenza epidemics based on internet-based information. The recently proposed `autoregressive model using Google (ARGO) model` (Yang et al., 2015) is one of those influenza tracking models that harness search queries from Google as well as the reports from the Centers for Disease Control (CDC), and appears to outperform the existing method such as `Google Flu Trends (GFT)`. Although the ARGO predicts well the outbreaks of influenza, this study demonstrates that a classical seasonal autoregressive integrated moving average (SARIMA) model can outperform the ARGO. The SARIMA model incorporates more accurate seasonality of the past influenza activities and takes less input variables into account. Our findings show that the SARIMA model is a functional tool for monitoring influenza epidemics.
Joint HGLM approach for repeated measures and survival data
Ha, Il Do ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1083~1090
DOI : 10.7465/jkdi.2016.27.4.1083
In clinical studies, different types of outcomes (e.g. repeated measures data and time-to-event data) for the same subject tend to be observed, and these data can be correlated. For example, a response variable of interest can be measured repeatedly over time on the same subject and at the same time, an event time representing a terminating event is also obtained. Joint modelling using a shared random effect is useful for analyzing these data. Inferences based on marginal likelihood may involve the evaluation of analytically intractable integrations over the random-effect distributions. In this paper we propose a joint HGLM approach for analyzing such outcomes using the HGLM (hierarchical generalized linear model) method based on h-likelihood (i.e. hierarchical likelihood), which avoids these integration itself. The proposed method has been demonstrated using various numerical studies.
Noninformative priors for linear function of parameters in the lognormal distribution
Lee, Woo Dong ; Kim, Dal Ho ; Kang, Sang Gil ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1091~1100
DOI : 10.7465/jkdi.2016.27.4.1091
This paper considers the noninformative priors for the linear function of parameters in the lognormal distribution. The lognormal distribution is applied in the various areas, such as occupational health research, environmental science, monetary units, etc. The linear function of parameters in the lognormal distribution includes the expectation, median and mode of the lognormal distribution. Thus we derive the probability matching priors and the reference priors for the linear function of parameters. Then we reveal that the derived reference priors do not satisfy a first order matching criterion. Under the general priors including the derived noninformative priors, we check the proper condition of the posterior distribution. Some numerical study under the developed priors is performed and real examples are illustrated.
A meta-regression analysis on the effects of parenting programs for children with disabilities in Korea
Kim, Young A ; Cho Chung, Hyang-In ; Yoon, Sanghoo ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1101~1113
DOI : 10.7465/jkdi.2016.27.4.1101
This study was conducted to analyze the effects of parenting programs for children with disabilities through meta-regression analysis of experimental studies published in Korea. Twenty-two studies with a randomized or non-randomized control group prepost test design were included in the analysis. Parenting programs had a significant effect on parenting stress (ES
Erratum to "Categorical time series clustering: Case study of Korean pro-baseball data"
Pak, Ro Jin ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1115~1115
DOI : 10.7465/jkdi.2016.27.4.1115
Retraction notice to "Goodness-of-fit tests for a proportional odds model"
Lee, Hyun Yung ;
Journal of the Korean Data and Information Science Society, volume 27, issue 4, 2016, Pages 1117~1117
DOI : 10.7465/jkdi.2016.27.4.1117
Journal of the Korean Data & Information Science Society, Vol. 24, No. 6, 1465-1475, 2013 (http://dx.doi.org/10.7465/jkdi.2013.24.6.1465). This article has been retracted at the research ethics committee.