Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 23, Issue 6 - Nov 2012
Volume 23, Issue 5 - Sep 2012
Volume 23, Issue 4 - Jul 2012
Volume 23, Issue 3 - May 2012
Volume 23, Issue 2 - Mar 2012
Volume 23, Issue 1 - Jan 2012
Selecting the target year
On the characteristics of the Hamming distances in medical diagnosis
Ahn, Jeong-Yong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 227~234
DOI : 10.7465/jkdi.2012.23.2.227
Hamming distances in medical science are used for the diagnosis of diseases. The differences of the distances, however, are often very small, and is not in the general statistical form such as normal or chi-square distribution. In this study, we explore the characteristics and significance of the differences of Hamming distances generated in medical diagnosis.
A credit classification method based on generalized additive models using factor scores of mixtures of common factor analyzers
Lim, Su-Yeol ; Baek, Jang-Sun ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 235~245
DOI : 10.7465/jkdi.2012.23.2.235
Logistic discrimination is an useful statistical technique for quantitative analysis of financial service industry. Especially it is not only easy to be implemented, but also has good classification rate. Generalized additive model is useful for credit scoring since it has the same advantages of logistic discrimination as well as accounting ability for the nonlinear effects of the explanatory variables. It may, however, need too many additive terms in the model when the number of explanatory variables is very large and there may exist dependencies among the variables. Mixtures of factor analyzers can be used for dimension reduction of high-dimensional feature. This study proposes to use the low-dimensional factor scores of mixtures of factor analyzers as the new features in the generalized additive model. Its application is demonstrated in the classification of some real credit scoring data. The comparison of correct classification rates of competing techniques shows the superiority of the generalized additive model using factor scores.
Estimation methods of fuel consumption using distance traveled: Focused on Monte Carlo method
Park, Chun-Gun ; Soh, Jin-Young ; Lee, Yung-Seop ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 247~256
DOI : 10.7465/jkdi.2012.23.2.247
Recently, estimation of greenhouse gas (GHG) emission has continuously emerged as an important global issue. This study compares various statistical methods for estimation of fuel consumption, which is necessary for calculation of GHG emission in road transportation sector. Existing methods have focused on using merely transportation fuel supply or distance traveled for calculation of fuel consumption. Estimates of GHG emission based on fuel supply, however, cannot reflect various vehicle types or model year. This study suggests and compares, from statistical point of view, several methods, which can be applied to estimate fuel consumption of each vehicle, by combining distance traveled and fuel efficiency (mileage), and total fuel consumption of all vehicles. It also suggests practical measures that can reflect vehicle types and model year to suggested methods for future research.
Survival analysis on the business types of small business using Cox`s proportional hazard regression model
Park, Jin-Kyung ; Oh, Kwang-Ho ; Kim, Min-Soo ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 257~269
DOI : 10.7465/jkdi.2012.23.2.257
Global crisis expedites the change in the environment of industry and puts small size enterprises in danger of mass bankruptcy. Because of this, domestic small size enterprises is an urgent need of restructuring. Based on the small business data registered in the Credit Guarantee Fund, we estimated the survival probability in the context of the survival analysis. We also analyzed the survival time which are distinguished depending on the types of business in the small business. Financial variables were also conducted using COX regression analysis of small businesses by types of business. In terms of types of business wholesale and retail trade industry and services were relatively high in the survival probability than light, heavy, and the construction industries. Especially the construction industry showed the lowest survival probability. In addition, we found that construction industry, the bigger BIS (bank of international settlements capital ratio) and current ratio are, the smaller default-rate is. But the bigger borrowing bond is, the bigger default-rate is. In the light industry, the bigger BIS and ROA (return on assets) are, the smaller a default-rate is. In the wholesale and retail trade industry, the bigger bis and current ratio are, the smaller a default-rate is. In the heavy industry, the bigger BIS, ROA, current ratio are, the smaller default-rate is. Finally, in the services industry, the bigger current ratio is, the smaller a default-rate is.
Prediction of the industrial stock price index using domestic and foreign economic indices
Choi, Ik-Sun ; Kang, Dong-Sik ; Lee, Jung-Ho ; Kang, Min-Woo ; Song, Da-Young ; Shin, Seo-Hee ; Son, Young-Sook ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 271~283
DOI : 10.7465/jkdi.2012.23.2.271
In this paper, we predicted the rise or the fall in eleven major industrial stock price indices unlike existing studies dealing with the prediction of KOSPI that combines all industries. We used as input variables not only domestic economic indices but also foreign economic indices including the U.S.A, Japan, China and Europe that have affected korean stock market. Numerical analysis through SAS E-miner showed above or below about 60% accuracy using the logistic regression and neural network model.
A study on relationship between the performance of professional baseball players and annual salary
Seung, Hee-Bae ; Kang, Kee-Hoon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 285~298
DOI : 10.7465/jkdi.2012.23.2.285
This research deals with a relationship between the performance of Korean professional baseball players and their annual salaries. It is based on the sabermetrics, which measures the performance of baseball batters in a refined way. We collect the records of batters of eight professional baseball clubs during the season 2009 and 2010. Then, we calculate every index of the sabermetrics. Principal component analysis is used for examining the relationship between those indexes of sabermetrics and annual salary for the next year. In general, batters who show higher performance get more salary. The result of this research can be useful in order to reach an agreement on salary between a player and his club partner.
A study on decision tree creation using marginally conditional variables
Cho, Kwang-Hyun ; Park, Hee-Chang ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 299~307
DOI : 10.7465/jkdi.2012.23.2.299
Data mining is a method of searching for an interesting relationship among items in a given database. The decision tree is a typical algorithm of data mining. The decision tree is the method that classifies or predicts a group as some subgroups. In general, when researchers create a decision tree model, the generated model can be complicated by the standard of model creation and the number of input variables. In particular, if the decision trees have a large number of input variables in a model, the generated models can be complex and difficult to analyze model. When creating the decision tree model, if there are marginally conditional variables (intervening variables, external variables) in the input variables, it is not directly relevant. In this study, we suggest the method of creating a decision tree using marginally conditional variables and apply to actual data to search for efficiency.
Comparison of physique and physical fitness in sports talent children with TES program
Lee, Mi-Sook ; Eo, Su-Ju ; Park, Cheol-Yong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 309~315
DOI : 10.7465/jkdi.2012.23.2.309
The purpose of this study was to examine the comparison of physique with physical fitness according to TBS (Talented-Educational in Sport) program by H University in 2009-2011. For this study, 668 elementary students (2009: 297, 2010: 194, 2011: 177 or 1st: 506, 2nd: 104, 3rd: 58) were collected who aged 7 to 13 living in Seoul and Gyeonggi area. The subjects were measured on physique variables (5) and physical fitness variables (7). Mean comparisons (ANOVA) were conducted for each gender in order to compare the mean differences among attendance number. For association analysis, Pearson correlation coefficient was used to find association between the physique and physical fitness variables. Some physical fitness variables (sit up, half squat jump, side step, standing long jump, flexibility in male children; sit up, half squat jump, side step in female children) increased significantly in the attendance number but the physique variables did not. The results show that TES program was effective on the physical fitness variables (muscle endurance & agility) in sports talent children.
Structural relationship among servant leadership, empowerment and sports satisfaction of badminton coaches
Lee, Mi-Sook ; Nam, Jung-Hoon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 317~331
DOI : 10.7465/jkdi.2012.23.2.317
The purpose of this study was to verify the relationship among servant leadership, empowerment and sports satisfaction of badminton coaches by self-leadership. Among national badminton players, total 343 copies of data were collected and used at the study by using of random sampling. The normal distribution on data and the validity and reliability for each factors were proven to confirm through descriptive statistics, exploratory & confirmatory factor analysis and reliability analysis with SPSS 18.0 and AMOS 18.0 program. The relationship among each factors by the purpose of study were analyzed by correlation analysis and structural model analysis. The results were as follows. First, the servant leadership of badminton coaches had positive effect on empowerment. Second, the servant leadership of badminton coaches had positive effect on self-leadership. Third, the servant leadership of badminton coaches had positive effect on sports satisfaction. Fourth, empowerment had positive effect on sports satisfaction. Fifth, self-leadership had positive effect on sports satisfaction. Sixth, for the relationship between servant leadership and sports satisfaction, empowerment and self-leadership had indirect effects.
A structural equation model for career maturity
Lee, Jung-Min ; Park, Cheol-Yong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 333~342
DOI : 10.7465/jkdi.2012.23.2.333
This study is conducted to see if college students` career identity and career decision-making self-efficacy affect job-seeking stress and if these three factors have influence on career maturity. Data is collected from 259 college students enrolled in three universities in Daegu. The results show that career identity and career decision-making self-efficacy both have negative direct effects on job-seeking stress and positive indirect effects on career maturity. It is also found that job-seeking stress has a negative direct effect on career maturity.
Analysis of academic achievement based on the university admission factors -A university case in 2011-
Choi, Hyun-Seok ; Ha, Jeong-Cheol ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 343~351
DOI : 10.7465/jkdi.2012.23.2.343
We analyse the relations between the academic achievement and the university admission factors among the class of 2011 at A university classified by sex and entrance test type. We can provide helpful tools for the university entrance policy by finding the intimate admission factors with academic achievement. We found that the university admission factors had effects on the academic achievement differently according to sex and entrance test type. Female students and regular admission achieved more than male students and occasional admission, respectively. Korea scholastic aptitude test had more effects on academic achievement for male students and regular admission type NA but academic achievement in high school life had more effects for female students and regular admission type DA.
Noninformative priors for common scale parameter in the regular Pareto distributions
Kang, Sang-Gil ; Kim, Dal-Ho ; Kim, Yong-Ku ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 353~363
DOI : 10.7465/jkdi.2012.23.2.353
In this paper, we introduce the noninformative priors such as the matching priors and the reference priors for the common scale parameter in the Pareto distributions. It turns out that the posterior distribution under the reference priors is not proper, and Jeffreys` prior is not a matching prior. It is shown that the proposed first order prior matches the target coverage probabilities in a frequentist sense through simulation study.
Pricing weather derivatives: An application to the electrical utility
Zou, Zhixia ; Lee, Kwang-Bong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 365~374
DOI : 10.7465/jkdi.2012.23.2.365
Weather derivatives designed to manage casual changes of weather, as opposed to catastrophic risks of weather, are relatively a new class of financial instruments. There are still many theoretical and practical challenges to the effective use of these instruments. The objective of this paper is to develop a pricing approach for valuing weather derivatives and presents a case study that is practical enough to be used by the risk managers of electrical utility firms. Utilizing daily average temperature data of Guangzhou, China from
January 1978 to
December 2010, this paper adopted a univariate time series model to describe weather behavior dynamics and calculates equilibrium prices for weather futures and options for an electrical utility firm in the region. The results imply that the risk premium is an important part of derivatives prices and the market price of risk affects option values much more than forward prices. It also demonstrates that weather innovation as well as weather risk management significantly affect the utility`s financial outcomes.
Study on the ensemble methods with kernel ridge regression
Kim, Sun-Hwa ; Cho, Dae-Hyeon ; Seok, Kyung-Ha ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 375~383
DOI : 10.7465/jkdi.2012.23.2.375
The purpose of the ensemble methods is to increase the accuracy of prediction through combining many classifiers. According to recent studies, it is proved that random forests and forward stagewise regression have good accuracies in classification problems. However they have great prediction error in separation boundary points because they used decision tree as a base learner. In this study, we use the kernel ridge regression instead of the decision trees in random forests and boosting. The usefulness of our proposed ensemble methods was shown by the simulation results of the prostate cancer and the Boston housing data.
Semiparametric kernel logistic regression with longitudinal data
Shim, Joo-Yong ; Seok, Kyung-Ha ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 385~392
DOI : 10.7465/jkdi.2012.23.2.385
Logistic regression is a well known binary classification method in the field of statistical learning. Mixed-effect regression models are widely used for the analysis of correlated data such as those found in longitudinal studies. We consider kernel extensions with semiparametric fixed effects and parametric random effects for the logistic regression. The estimation is performed through the penalized likelihood method based on kernel trick, and our focus is on the efficient computation and the effective hyperparameter selection. For the selection of optimal hyperparameters, cross-validation techniques are employed. Numerical results are then presented to indicate the performance of the proposed procedure.
Two-step LS-SVR for censored regression
Bae, Jong-Sig ; Hwang, Chang-Ha ; Shim, Joo-Yong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 393~401
DOI : 10.7465/jkdi.2012.23.2.393
This paper deals with the estimations of the least squares support vector regression when the responses are subject to randomly right censoring. The estimation is performed via two steps - the ordinary least squares support vector regression and the least squares support vector regression with censored data. We use the empirical fact that the estimated regression functions subject to randomly right censoring are close to the true regression functions than the observed failure times subject to randomly right censoring. The hyper-parameters of model which affect the performance of the proposed procedure are selected by a generalized cross validation function. Experimental results are then presented which indicate the performance of the proposed procedure.
Estimation of Freund model under censored data
Cho, Kil-Ho ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 403~409
DOI : 10.7465/jkdi.2012.23.2.403
We consider a life testing experiment in which several two-component shared parallel systems are put on test, and the test is terminated at a predesigned experiment time. In this thesis, the maximum likelihood estimators for parameters of Freund`s bivariate exponential distribution under the system level life testing are obtained. Results of comparative studies based on Monte Carlo simulation are presented.
Simulation study on the estimation of multinomial proportions
Kim, Dae-Hak ;
Journal of the Korean Data and Information Science Society, volume 23, issue 2, 2012, Pages 411~417
DOI : 10.7465/jkdi.2012.23.2.411
In this paper, we consider the estimation of multinomial proportions. Multinomial distribution is the most important multivaritate distribution. Estimation of multinomial parameters for multinomial distribution is widely applicable to many practical research areas including genetics. We investigated the properties of several frequency substitution estimates and derived the maximum likelihood estimate of multinomial proportions of Hardy Weinberg proportions. Phenotype and genotype frequencies of allele are used to the estimation of multinomial proportions. These estimates are then analyzed via numerical data. Small sample Monte Carlo simulation is conducted to compare considered estimates of multinomial proportions.