Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 20, Issue 6 - Nov 2009
Volume 20, Issue 5 - Sep 2009
Volume 20, Issue 4 - Jul 2009
Volume 20, Issue 3 - May 2009
Volume 20, Issue 2 - Mar 2009
Volume 20, Issue 1 - Jan 2009
Selecting the target year
An estimation method based on autocovariance in the simple linear regression model
Park, Cheol-Yong ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 251~260
In this study, we propose a new estimation method based on autocovariance for selecting optimal estimators of the regression coefficients in the simple linear regression model. Although this method does not seem to be intuitively attractive, these estimators are unbiased for the corresponding regression coefficients. When the exploratory variable takes the equally spaced values between 0 and 1, under mild conditions which are satisfied when errors follow an autoregressive moving average model, we show that these estimators have asymptotically the same distributions as the least squares estimators. Additionally, under the same conditions as before, we provide a self-contained proof that these estimators converge in probability to the corresponding regression coefficients.
Nonparametric homogeneity tests of two distributions for credit rating model validation
Hong, Chong-Sun ; Kim, Ji-Hoon ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 261~272
Kolmogorov-Smirnov (K-S) statistic has been widely used for testing homogeneity of two distributions in the credit rating models. Joseph (2005) used K-S statistic to obtain validation criteria which is most well-known. There are other homogeneity test statistics such as the Cramer-von Mises, Anderson-Darling, and Watson statistics. In this paper, these statistics are introduced and applied to obtain criterion of these statistics by extending Joseph (2005)'s work. Another set of alternative criterion is suggested according to various sample sizes, type a error rates, and the ratios of bads and goods by using the simulated data under the similar situation as real credit rating data. We compare and explore among Joseph's criteria and two sets of the proposed criterion and discuss their applications.
A study on the effects of DINESERV's 5-dimensions by multiply-model on satisfaction, revisit intention and customer loyalty
Cho, Yoon-Shik ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 273~281
The gap(P-E)-model is based on the disconfirmation paradigm that tries to under stand the effect of the gap between before purchase expectations and after purchase perceptions of the product performance on dependent variables such as customer satis-faction. But Bhote proposed multiply(
)-model instead of gap(P-E)-model in 1998. This paper is focused on Bhote's multiply(
)-model in food service industry. The purpose of this research is to test whether DINESERV's 5-dimensions by multiply(
) model fits in explaining satisfaction, revisit intention and customer loyalty. The F-value of regression model was used to test the fitness of regression model of the multiply(
)-model. Through analysis, it was found that the multiply(
A Study on the DR program operation method based on the pattern analysis
Kang, Jung-Chul ; Lee, Hyun-Woo ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 283~292
Recently the stable electric power supply is threaten by the rapid development of all the industries consuming a lot of electric power, thus KEPCO is trying to establish the countermeasures to cope with this circumstance of uprising power consumption by running the efficient and flexible 12-month s price policy for the base price and raisin the electric power price remarkably in peak power consumption period to suppress the maximal power demand of consumers. To resolve these problems, this study propose the method calculating hourly CBL based on the hourly amount at non-event using the pattern of the amount of power consumption and propose the operation method of DR Program.
Predicting ozone warning days based on an optimal time series model
Park, Cheol-Yong ; Kim, Hyun-Il ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 293~299
In this article, we consider linear models such as regression, ARIMA (autoregressive integrated moving average), and regression+ARIMA (regression with ARIMA errors) for predicting hourly ozone concentration level in two areas of Daegu. Based on RASE(root average squared error), it is shown that the ARIMA is the best model in one area and that the regression+ARIMA model is the best in the other area. We further analyze the residuals from the optimal models, so that we might predict the ozone warning days where at least one of the hourly ozone concentration levels is over 120 ppb. Based on the training data in the years from 2000 to 2003, it is found that 35 ppb is a good cutoff value of residulas for predicting the ozone warning days. In on area of Daegu, our method predicts correctly one of two ozone warning days of 2004 as well as all of the remaining 364 non-warning days. In the other area, our methods predicts correctly all of one ozone warning days and 365 non-warning days of 2004.
A study on the efficiency of multidimensional scalin using bootstrap method
Kim, Woo-Jong ; Kang, Kee-Hoon ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 301~309
Multidimensional scaling(MDS) is a statistical multivariate analysis technique that is often used in information visualization for exploring similarities or dissimilarities in data. In order to analyse and visualize data, MDS measures the dissimilarities between objects and uses them or their mean if they are repeatedly measured. When there exist outliers or when the variation of data is too large, we can hardly get reliable results on the research using MDS. In this paper, we consider the MDS based on bootstrap method when the variation of data is large. Standardized residual sum of squares is considered as measuring goodness-of-fit of the model. A real data analysis is include to examine our approach.
A comparison study of classification method based of SVM and data depth in microarray data
Hwang, Jin-Soo ; Kim, Jee-Yun ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 311~319
A robust L1 data depth was used in clustering and classification, so called DDclus and DDclass by Jornsten (2004). SVM-based classification works well in most of the situation but show some weakness in the presence of outliers. Proper gene selection is important in classification since there are so many redundant genes. Either by selecting appropriate genes or by gene clustering combined with classification method enhance the overall performance of classification. The performance of depth based method are evaluated among several SVM-based classification methods.
A study on the properties of sensitivity analysis in principal component regression and latent root regression
Shin, Jae-Kyoung ; Chang, Duk-Joon ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 321~328
In regression analysis, the ordinary least squares estimates of regression coefficients become poor, when the correlations among predictor variables are high. This phenomenon, which is called multicollinearity, causes serious problems in actual data analysis. To overcome this multicollinearity, many methods have been proposed. Ridge regression, shrinkage estimators and methods based on principal component analysis (PCA) such as principal component regression (PCR) and latent root regression (LRR). In the last decade, many statisticians discussed sensitivity analysis (SA) in ordinary multiple regression and same topic in PCR, LRR and logistic principal component regression (LPCR). In those methods PCA plays important role. Many statisticians discussed SA in PCA and related multivariate methods. We introduce the method of PCR and LRR. We also introduce the methods of SA in PCR and LRR, and discuss the properties of SA in PCR and LRR.
More effective application of importance-performance analysis in the case of cyber lecture
Pak, Ro-Jin ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 329~338
The importance performance analysis is a simple and condensed analytic method for decision making based on the level of performance or satisfaction. Many researches already have witnessed usefulness of the importance performance analysis, but it also has some drawbacks from the statistical points of view. In this article, some additional techniques dealing the importance performance analysis are introduced and it is shown that these techniques would turn out to be very informative. The importance performance analysis uses the arithmetic average as the main statistic, but by the use of the median, the frequency and the cluster analysis it is shown that the importance performance analysis can be carried out with more crucial information. In addtion to that, it is demonstrated that the combination of the analytic hierarchy process and importance performance analysis could enable more reliable decision making.
The effects of motorized flexion-distraction treatment on the lumbosacral region angle in patients with chronic low back pain
Ma, Sang-Yeol ; Gong, Won-Tae ; Cho, Gyo-Young ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 339~348
This study is to examine effects of motorized flexion-distraction treatment on the pain, lumbosacral angle, lumbar lordosis angle, and lumbar 5 (L5) intervertebral disc angle in patients with chronic low back pain. We selected 30 cases of chronic low back pain, which were evenly divided into two groups: experimental group and control group. We applied the same hot pack, interferential current therapy, and ultrasound therapy to both groups. The experimental group had additional treatment of motrized flexion-distraction therapy and control group had additional of stretching exercise. For each subject, the pain, lumbosacral angle, lumbar lordosis angle, and lumbar 5 (L5) intervertebral disc angle were measured before and after treatment, While experimental groups showed significant improvements after treatment, more significant effects were found in the experimental group.
A comparison study on the estimation of the relative risk for the unemployed rate in small area
Park, Jong-Tae ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 349~356
In this study, we suggest the estimation method of the relative risk for the unemployment statistics of a small area such as si, gun, gu in Korea. The considered method are the usual pooled estimator, weighted estimator with the inverse of log-variance as weights, and the Jackknife estimator. And we compare with the efficiency of the three estimators by estimating the bias and mean square errors using real data from the 2002 Economically Active Population Survey of Gyeonggi-do. We compute the unemployed rate of male and female in small areas, and then estimate the common relative risk for the unemployed rate between male and female. Also, the stability and reliability of the three estimators for the common relative risk was evaluated using the RB(relative bias) and the RRMSE(relative root mean square error) of these estimators. Finally, the Jackknife estimator turned out to be much more efficient than the other estimators.
On statistical methods used in medical research
Choi, Young-Woong ; Kang, Kee-Hoon ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 357~367
According to the development of modern medical science, one can find many other related researches in various fields. In order to get correct research results, research design, research process and analysis of results should be done in objective and reasonable manner. Therefore, various statistical analysis approaches are widely used. In this paper, we investigate the usage of statistical methods in research papers published in four medical journals between 2004 and 2007.
Data analysis of the fourth Jeollabuk-do local election result
Choi, Kyoung-Ho ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 369~375
The next local election for Jeollabuk-do will be held in 2010. In preparation for this, we conducted a study to observe whether or not if minute regionalism occurred during the fourth "Nationally coordinated local election" which was held on May 31st in 2006. This study is based on Jeollabuk-do provincial governor election data. For this, we introduced a RS index which is used to measure how evenly each candidate for governor of the province received votes, and chi-statistics that measure each candidate's local intimacy. Further more, we checked out whether minute regionalism occurred or not by putting to practical use correspondence analysis. As a result, we could confirm that minute regionalism occurred to a few candidates. After reviewing many measurements, we found that a RS index's validity is not high.
An empirical study on the selection of the optimal covariance pattern model for the weight loss data
Jo, Jin-Nam ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 377~385
Twenty five female students in Seoul participated and were divided into two group in the experiment of weight loss effect of two treatments. Fourteen students(Treatment A group), randomly chosen from the students, had fed on diet foods and exercised over 8 weeks, and the remaining students(Treatment B group) had fed on diet foods only for the same periods. Weights of 25 students had been measured repeatedly four times at an interval of two weeks during 8 weeks, It resulted from mixed model analysis of repeated measurements data that separate Toeplitz pattern for each treatment group was selected as the optimal covariance pattern. Based upon the optimal covariance pattern model, the baseline effect and time effect were found to be highly significant, but the treatment-time interaction effect was found to be insignificant. Finally, the students with diet foods and exercises were more effective in losing weight than the students with only diet foods were.
Design and evaluation of a dissimilarity-based anomaly detection method for mobile wireless networks
Lee, Hwa-Ju ; Bae, Ihn-Han ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 387~399
Mobile wireless networks continue to be plagued by theft of identify and intrusion. Both problems can be addressed in two different ways, either by misuse detection or anomaly-based detection. In this paper, we propose a dissimilarity-based anomaly detection method which can effectively identify abnormal behavior such as mobility patterns of mobile wireless networks. In the proposed algorithm, a normal profile is constructed from normal mobility patterns of mobile nodes in mobile wireless networks. From the constructed normal profile, a dissimilarity is computed by a weighted dissimilarity measure. If the value of the weighted dissimilarity measure is greater than the dissimilarity threshold that is a system parameter, an alert message is occurred. The performance of the proposed method is evaluated through a simulation. From the result of the simulation, we know that the proposed method is superior to the performance of other anomaly detection methods using dissimilarity measures.
Power analysis for 3
3 Latin square design
Choi, Young-Hun ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 401~410
Due to the characteristics of 3
3 Latin square design which is composed of two block effects and one main effect, powers of rank transformed statistic for testing the main effect are very superior to powers of parametric statistic without regard to the type of population distributions. By order of when all three effects are fixed, when on one block effect is random, when two block effects are random, the rank transform statistic for testing the main effect shows relatively high powers as compared with the parametric statistic. Further when the size of main effect is big with one equivalent size of block effect and the other small size of block effect, powers of rank transformed statistic for testing the main effect demonstrate excellent advantage to powers of parametric statistic.
Prediction in run-off triangle using Bayesian linear model
Lee, Ju-Mi ; Lim, Jo-Han ; Hahn, Kyu-S. ; Lee, Kyeong-Eun ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 411~423
In the current paper, by extending Verall (1990)'s work, we propose a new Bayesian model for analyzing run-off triangle data. While Verall's (1990) work only account for the calendar year and evolvement time effects, our model further accounts for the "absolute time" effects. We also suggest a Markov Chain Monte Carlo method that can be used for estimating the proposed model. We apply our proposed method to analyzing three empirical examples. The results demonstrate that our method significantly reduces prediction error when compared with the existing methods.
Design and implementation of data mining tool using PHP and WEKA
You, Young-Jae ; Park, Hee-Chang ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 425~433
Data mining is the method to find useful information for large amounts of data in database. It is used to find hidden knowledge by massive data, unexpectedly pattern, relation to new rule. We need a data mining tool to explore a lot of information. There are many data mining tools or solutions; E-Miner, Clementine, WEKA, and R. Almost of them are were focused on diversity and general purpose, and they are not useful for laymen. In this paper we design and implement a web-based data mining tool using PHP and WEKA. This system is easy to interpret results and so general users are able to handle. We implement Apriori algorithm of association rule, K-means algorithm of cluster analysis, and J48 algorithm of decision tree.
Change point estimators in monitoring the parameters of an IMA(1,1) model
Lee, Ho-Yun ; Lee, Jae-Heon ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 435~443
Knowing the time of the process change could lead to quicker identification of the responsible special cause and less process down time, and it could help to reduce the probability of incorrectly identifying the special cause. In this paper, we propose the maximum likelihood estimator (MLE) for the process change point when a control chart is used in monitoring the parameters of a process in which the observations can be modeled as a IMA(1,1).
Credibility estimation via kernel mixed effects model
Shim, Joo-Yong ; Kim, Tae-Yoon ; Lee, Sang-Yeol ; Hwa, Chang-Ha ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 445~452
Credibility models are actuarial tools to distribute premiums fairly among a heterogeneous group of policyholders. Many existing credibility models can be expressed as special cases of linear mixed effects models. In this paper we propose a nonlinear credibility regression model by reforming the linear mixed effects model through kernel machine. The proposed model can be seen as prediction method applicable in any setting where repeated measures are made for subjects with different risk levels. Experimental results are then presented which indicate the performance of the proposed estimating procedure.
Numerical study on Jarque-Bera normality test for innovations of ARMA-GARCH models
Lee, Tae-Wook ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 453~458
In this paper, we consider Jarque-Bera (JB) normality test for the innovations of ARMA-GARCH models. In financial applications, JB test based on the residuals are routinely used for the normality of ARMA-GARCH innovations without a justification. However, the validity of JB test should be justified in advance of the actual practice (Lee et al., 2009). Through the simulation study, it is found that the validity of JB test depends on the shape of test statistic. Specifically, when the constant term is involved in ARMA model, a certain type of residual based JB test produces severe size distortions.
Notes on a skew-symmetric inverse double Weibull distribution
Woo, Jung-Soo ;
Journal of the Korean Data and Information Science Society, volume 20, issue 2, 2009, Pages 459~465
For an inverse double Weibull distribution which is symmetric about zero, we obtain distribution and moment of ratio of independent inverse double Weibull variables, and also obtain the cumulative distribution function and moment of a skew-symmetric inverse double Weibull distribution. And we introduce a skew-symmetric inverse double Weibull generated by a double Weibull distribution.