Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 23, Issue 6 - Nov 2012
Volume 23, Issue 5 - Sep 2012
Volume 23, Issue 4 - Jul 2012
Volume 23, Issue 3 - May 2012
Volume 23, Issue 2 - Mar 2012
Volume 23, Issue 1 - Jan 2012
Selecting the target year
Model selection method for categorical data with non-response
Yoon, Yong-Hwa ; Choi, Bo-Seung ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 627~641
DOI : 10.7465/jkdi.2012.23.4.627
We consider a model estimation and model selection methods for the multi-way contingency table data with non-response or missing values. We also consider hierarchical Bayesian model in order to handle a boundary solution problem that can happen in the maximum likelihood estimation under non-ignorable non-response model and we deal with a model selection method to find the best model for the data. We utilized Bayes factors to handle model selection problem under Bayesian approach. We applied proposed method to the pre-election survey for the 2004 Korean National Assembly race. As a result, we got the non-ignorable non-response model was favored and the variable of voting intention was most suitable.
Check for regression coefficient using jackknife and bootstrap methods in clinical data
Sohn, Ki-Cheul ; Shin, Im-Hee ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 643~648
DOI : 10.7465/jkdi.2012.23.4.643
There are lots of analysis to determine the relation between dependent variable and explanatory variables. Often the regression analysis is used to do this, and we can analyze the how much the explanatory variable can be related with dependent variable and how much the regression model can explain the data. But the validation check of regression model is usually determined by coefficient of determination. We should check the validation of regression coefficient with different methods. This paper introduces the method for validation check the regression coefficient using the jackknife regression and bootstrap regression in clinical data.
Lipid metabolic effects of caffeine using meta-analysis
Kim, Na-Jung ; Choi, Ki-Heon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 649~656
DOI : 10.7465/jkdi.2012.23.4.649
The present study was carried out to summarize the effect of caffeine in the lipid metabolic by meta-analysis. The association measure to test effect of caffeine was the Hedges's standardized mean difference (HG). In this particular fixed-effect model of Hedges's standardized mean difference, weight gain, heart weight, serum total lipid, serum triglycerides and liver triglycerides were significantly decreased (p < 0.05). Also, serum HDL cholesterol and serum LDL cholesterol were significantly increased. In this case of heterogeneous variable, random effect model was applied. In this model, weight gain, heart weight, serum total lipid, serum triglycerides, serum LDL cholesterol and liver triglycerides were significantly decreased in caffeine treated group. Also HDL-cholesterol was significantly increased in caffeine treated group.
Outlier detection using Grubb test and Cochran test in clinical data
Sohn, Ki-Cheul ; Shin, Im-Hee ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 657~663
DOI : 10.7465/jkdi.2012.23.4.657
There are very small values and/or very big values which get out of the normal range for survey data in various fields. The reasons of occurrence for outlier are two. One of them is the error in process of data input and the other is the strange response of the respondent. If the data has outliers, then the summary statistics such as the mean and the variance produce misleading information. Therefore, researcher should be careful in detecting the outlier in data. In particular, it is very important problem for clinical fields because the cost of experiment is very high. This article introduce the Grubb test and Cochran test to detect outliers in the data and we apply this method for clinical data.
A comparison study on characteristics of IPTV and digital CATV subscribers
Ryu, Gui-Yeol ; Rhee, Eun-Jun ; Lee, Hyun-Woo ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 665~675
DOI : 10.7465/jkdi.2012.23.4.665
The object of study is to compare the characteristics of IPTV (Internet protocol television) subscribers and digital CATV (Community antenna television) subscribers. We used the adjusted residuals. We did gang survey for 100 subscribers in Seoul, Kyeonggi or Incheon regions. Compared to digital CATV, IPTV seems to have appropriate contents for interactive services and high satisfaction level for cost, but high dissatisfaction level for channels. Digital CATV subscribers would know and subscribe passively, compared to IPTV subscribers. IPTV subscribers are more satisfied than digital CATV subscribers even though IPTV is higher cost. Main content of digital CATV is sports, which is not interactive contents. People think IPTV is an innovative service which let subscribers see wanted contents always, but digital CATV is an extension of CATV. So digital CATV cannot use advantages of interactive services. Even though two services are almost same, they are very different in recognition.
Study for the sampling method using simulation in clinical data
Sohn, Ki-Cheul ; Kim, Dal-Ho ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 677~682
DOI : 10.7465/jkdi.2012.23.4.677
There are lots of sampling design which is determined for sample survey in various fields. Especially, it is important problem for clinical data because basic characteristic variables by group which consist of experiment group and control group in population should be reflect to sample. Therefore, frequencies, center scales and dispersion scales of variables by group in population should be similar in sample. But usual sampling design is very complicate so it is difficult to use in practice for researcher. In this paper, we consider the sampling method using simulation. We applied the proposed method to colon cancer data from a hospital. We compare basic characteristic variables between population and sample with mean, frequency and statistic hypothesis test.
Power study for 4 × 4 graeco-latin square design
Choi, Young-Hun ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 683~691
DOI : 10.7465/jkdi.2012.23.4.683
graeco-latin square design, powers of rank transformed statistic for testing the main effect are superior to powers of parametric statistic without regard to the effect structure with equally or unequally spaced effect levels as well as the type of population distributions such as exponential, double exponential, normal and uniform distribution. As numbers of block effect or effect sizes are decreased, powers of rank transformed statistic are much higher than powers of parametric statistic. In case that block effects are smaller than a main effect or one block effect is higher than other block effects, powers of rank transformed statistic are much higher than powers of parametric statistic in
graeco-latin square design with three block effects and one main effect.
Inflow and outflow analysis of double majors using social network analysis
Cho, Jang-Sik ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 693~701
DOI : 10.7465/jkdi.2012.23.4.693
Recently, the number of students who get double majors has tended to increase in many universities. As results, many problems occur because immoderate inflow of double-major students is concentrated in a specific popular department. In this paper, we study the characteristic of inflow and outflow of double majors using social network analysis and decision tree analysis. According to the results, SAT score affected the inflow of double majors the most. Additionally, department category, course evaluation score, employment rate also affected the inflow of double majors in the order named. On the other hand, department category affected the outflow of double majors the most. Additionally, SAT score, employment rate, course evaluation score also affected the outflow of double majors in the order named.
The effects of participation in a combined exercise program on the metabolic syndrome indices and physical fitness in the obese middle-aged women
Ban, Sung-Min ; Lee, Kyung-Jun ; Yang, Jeong-Ok ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 703~715
DOI : 10.7465/jkdi.2012.23.4.703
The purpose of this study is to observe the effects of the 12-week comprehensive exercise program on the metabolic syndrome index and general health of overweight middle aged women. Before and after the exercise program, research participants were measured in metabolic syndrome index and health fitness. The measurements gathered before and after the exercise program were analyzed through SPSS 18.0 to calculate average and standard deviation of all response variables. To find changes in the response variables before and after the 12-week program, Wilcoxon signed rank test was performed at a significance level of
=.05. The results of this research are as follows. The 12-week comprehensive exercise program has a positive impact on the metabolic index and health fitness of overweight middle-aged women.
Association rule thresholds considering the number of possible rules of interest items
Park, Hee-Chang ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 717~725
DOI : 10.7465/jkdi.2012.23.4.717
Data mining is a method to find useful information for large amounts of data in database. One of the well-studied problems in data mining is exploration for association rules. Association rule mining searches for interesting relationships among items in a given database by support, confidence, and lift. If we use the existing association rules, we can commit some errors by information loss not to consider the size of occurrence frequency. In this paper, we proposed a new association rule thresholds considering the number of possible rules of interest items and compare with existing association rule thresholds by example and real data. As the results, the new association rule thresholds were more useful than existing thresholds.
The effects of Cox distraction manipulation on functional assessment measures and disc herniation index in patients with L4-5 herniated disc
Kwon, Won-An ; Ryu, Young-Sang ; Ma, Sang-Yeol ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 727~738
DOI : 10.7465/jkdi.2012.23.4.727
The purpose of the present study was to determine the effect of a 4 week course of Cox distraction manipulation (CDM) combined with therapeutic modalities on the treatment of patients with L4-5 herniated nucleus pulposus (HNP). A total of 15 patients with L4-5 HNP (mean age, 37.76 years; age range 20-50years) participated in the study. A 4 week course of CDM combined with therapeutic modalities was delivered to the patients for 6 days per week for the first two weeks, and three times per week for two additional weeks. The entire treatment consisted of 18 visits over 4 week period. Comparisons of changes in the muscle strengthening (MS), straight leg raise (SLR), and oswestry disability index (ODI) at pre-intervention, after two weeks treatment sessions, and at discharge (after 18 treatment sessions) were analyzed. Comparisons of changes in the disc herniation index (DHI) at pre-intervention and at discharge were analyzed using the paired t-test. There were significant improvements in the outcome measures of MS Ibs, SLR test, and ODI score after 2 weeks and 4 weeks sessions of CDM combined with therapeutic modalities as compared with the pre-intervention. However, no significant different pre-test and post-test DHI. CDM combined with therapeutic modalities appears to be a safe and efficacious, noninvasive treatment modality for patients with L4-5 HNP.
Mixed effects least squares support vector machine for survival data analysis
Hwang, Chang-Ha ; Shim, Joo-Yong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 739~748
DOI : 10.7465/jkdi.2012.23.4.739
In this paper we propose a mixed effects least squares support vector machine (LS-SVM) for the censored data which are observed from different groups. We use weights by which the randomly right censoring is taken into account in the nonlinear regression. The weights are formed with Kaplan-Meier estimates of censoring distribution. In the proposed model a random effects term representing inter-group variation is included. Furthermore generalized cross validation function is proposed for the selection of the optimal values of hyper-parameters. Experimental results are then presented which indicate the performance of the proposed LS-SVM by comparing with a standard LS-SVM for the censored data.
M-quantile kernel regression for small area estimation
Shim, Joo-Yong ; Hwang, Chang-Ha ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 749~756
DOI : 10.7465/jkdi.2012.23.4.749
An approach widely used for small area estimation is based on linear mixed models. However, when the functional form of the relationship between the response and the input variables is not linear, it may lead to biased estimators of the small area parameters. In this paper we propose M-quantile kernel regression for small area mean estimation allowing nonlinearities in the relationship between the response and the input variables. Numerical studies are presented that show the sample properties of the proposed estimation method.
Effect of complex sample design on Pearson test statistic for homogeneity
Heo, Sun-Yeong ; Chung, Young-Ae ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 757~764
DOI : 10.7465/jkdi.2012.23.4.757
This research is for comparison of test statistics for homogeneity when the data is collected based on complex sample design. The survey data based on complex sample design does not satisfy the condition of independency which is required for the standard Pearson multinomial-based chi-squared test. Today, lots of data sets ara collected by complex sample designs, but the tests for categorical data are conducted using the standard Pearson chi-squared test. In this study, we compared the performance of three test statistics for homogeneity between two populations using data from the 2009 customer satisfaction evaluation survey to the service from Gyeongsangnam-do regional offices of education: the standard Pearson test, the unbiasedWald test, and the Pearsontype test with survey-based point estimates. Through empirical analyses, we fist showed that the standard Pearson test inflates the values of test statistics very much and the results are not reliable. Second, in the comparison of Wald test and Pearson-type test, we find that the test results are affected by the number of categories, the mean and standard deviation of the eigenvalues of design matrix.
Bandwidth selections based on cross-validation for estimation of a discontinuity point in density
Huh, Jib ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 765~775
DOI : 10.7465/jkdi.2012.23.4.765
The cross-validation is a popular method to select bandwidth in all types of kernel estimation. The maximum likelihood cross-validation, the least squares cross-validation and biased cross-validation have been proposed for bandwidth selection in kernel density estimation. In the case that the probability density function has a discontinuity point, Huh (2012) proposed a method of bandwidth selection using the maximum likelihood cross-validation. In this paper, two forms of cross-validation with the one-sided kernel function are proposed for bandwidth selection to estimate the location and jump size of the discontinuity point of density. These methods are motivated by the least squares cross-validation and the biased cross-validation. By simulated examples, the finite sample performances of two proposed methods with the one of Huh (2012) are compared.
A study on the outbound call center optimization
Kang, Jung-Chul ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 777~785
DOI : 10.7465/jkdi.2012.23.4.777
Recently, the rapid development of internet and information technology has led to rapid changes in many industries. One of the most rapidly developing industries is the call centers. Almost all public institutions, financial institutions including insurance companies, and shopping malls, many call center staffs are proceeding with the consultation. However, lack of call center staffs is leading to a lot of customers complaints. The function of outbound call centers such as promotion and sale of products is also arising some problems due to insufficient number of consulting staffs. In this study, we propose the call center model for maximizing the rate of call connection time and suggest the best way of call centers model to be the channel distributions using the data mining techniques.
The effects of the 16-weeks' combined exercise program on metabolic syndrome and autonomic nerve system of low-level physical strength group
Han, Jin-Man ; Lee, Kyeong-Jun ; Yang, Jeong-Ok ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 787~796
DOI : 10.7465/jkdi.2012.23.4.787
The aim of this study is to closely examine the changes in their metabolic syndrome index and autonomic nerve systems after the 16-weeks's combined exercise program is carried out on low-level physical strength group (PAPS 4-5 level students). They were divided into two groups; exercise training group (15) and control group (15). This program consisted of five-times-a-week's warm-ups, main activities and warm-downs and it takes 50 minutes per trial. Through SPSS 19.0, all averages and standard deviations of dependent variables were calculated. We first performed Shapiro-Wilk's normality test of the variables. Before verifying the effect of combined exercise program, we tested the equality of means of the variables between combined-exercise-programmed-group and control group through a two-sample t-test and carried out a paired t-test to check if the changes in the variables of two groups before and after 16 weeks are statistically significant. Every statistical test is performed at a significance level of
=.05. The results are as follows. When it came to metabolic syndrome index, there were statistically meaningful changes in waist measurement, triglyceride, glucose with empty stomach and HDL-C. Also, when it came to autonomic nerve system, there were meaningful changes in all variables. Consequently, it seems that the 16-weeks combined exercise program has positive effects on low level physical strength students.
Hidden truncation circular normal distribution
Kim, Sung-Su ; Sengupta, Ashis ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 797~805
DOI : 10.7465/jkdi.2012.23.4.797
Many circular distributions are known to be not only asymmetric but also bimodal. Hidden truncation method of generating asymmetric distribution is applied to a bivariate circular distribution to generate an asymmetric circular distribution. While many other existing asymmetric circular distributions can only model an asymmetric data, this new circular model has great flexibility in terms of asymmetry and bi-modality. Some properties of the new model, such as the trigonometric moment generating function, and asymptotic inference about the truncation parameter are presented. Simulation and real data examples are provided at the end to demonstrate the utility of the novel distribution.
Multivariate EWMA control charts for monitoring the variance-covariance matrix
Jeong, Jeong-Im ; Cho, Gyo-Young ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 807~814
DOI : 10.7465/jkdi.2012.23.4.807
We know that the exponentially weighted moving average (EWMA) control charts are sensitive to detecting relatively small shifts. Multivariate EWMA control charts are considered for monitoring of variance-covariance matrix when the distribution of process variables is multivariate normal. The performances of the proposed EWMA control charts are evaluated in term of average run length (ARL). The performance is investigated in three types of shifts in the variance-covariance matrix, that is, the variances, covariances, and variances and covariances are changed respectively. Numerical results show that all multivariate EWMA control charts considered in this paper are effective in detecting several kinds of shifts in the variance-covariance matrix.
Two model comparisons of software reliability analysis for Burr type XII distribution
An, Jeong-Hyang ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 815~823
DOI : 10.7465/jkdi.2012.23.4.815
In this paper reliability growth model in which the operating time between successive failure is a continuous random variable is proposed. This model is for Burr type XII distribution with two parameters which is discussed in two versions: the order statistics and non-homogeneous Poisson process. The two software reliability measures are obtained. The performance for two versions of the suggested model is tested on real data set by U-plot and Y-plot using Kolmogorov distance.
Estimating multiplicative competitive interaction model using kernel machine technique
Shim, Joo-Yong ; Kim, Mal-Suk ; Park, Hye-Jung ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 825~832
DOI : 10.7465/jkdi.2012.23.4.825
We propose a novel way of forecasting the market shares of several brands simultaneously in a multiplicative competitive interaction model, which uses kernel regression technique incorporated with kernel machine technique applied in support vector machines and other machine learning techniques. Traditionally, the estimations of the market share attraction model are performed via a maximum likelihood estimation procedure under the assumption that the data are drawn from a normal distribution. The proposed method is shown to be a good candidate for forecasting method of the market share attraction model when normal distribution is not assumed. We apply the proposed method to forecast the market shares of 4 Korean car brands simultaneously and represent better performances than maximum likelihood estimation procedure.
Noninformative priors for the ratio of the scale parameters in the half logistic distributions
Kang, Sang-Gil ; Kim, Dal-Ho ; Lee, Woo-Dong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 833~841
DOI : 10.7465/jkdi.2012.23.4.833
In this paper, we develop the noninformative priors for the ratio of the scale parameters in the half logistic distributions. We develop the first and second order matching priors. It turns out that the second order matching prior matches the alternative coverage probabilities, and is a highest posterior density matching prior. Also we reveal that the one-at-a-time reference prior and Jeffreys' prior are the second order matching prior. We show that the proposed reference prior matches the target coverage probabilities in a frequentist sense through simulation study, and an example based on real data is given.
Optimal three step stress accelerated life tests under periodic inspection and type I censoring
Moon, Gyoung-Ae ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 843~850
DOI : 10.7465/jkdi.2012.23.4.843
The inferences of data obtained from periodic inspection and type I censoring for the three step stress accelerated life test are studied in this paper. The failure rate function that a log-quadratic relation of stress and the tampered failure rate model are considered under the exponential distribution. The optimal stress change times which minimize the asymptotic variance of maximum likelihood estimators of parameters is determined and the maximum likelihood estimators of the model parameters are estimated. A numerical example will be given to illustrate the proposed inferential procedures.
Bayesian analysis for the bivariate Poisson regression model: Applications to road safety countermeasures
Choe, Hyeong-Gu ; Lim, Joon-Beom ; Won, Yong-Ho ; Lee, Soo-Beom ; Kim, Seong-W. ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 851~858
DOI : 10.7465/jkdi.2012.23.4.851
We consider a bivariate Poisson regression model to analyze discrete count data when two dependent variables are present. We estimate the regression coefficients as sociated with several safety countermeasures. We use Markov chain and Monte Carlo techniques to execute some computations. A simulation and real data analysis are performed to demonstrate model fitting performances of the proposed model.
Switching properties of CUSUM charts for controlling mean vector
Chang, Duk-Joon ; Heo, Sun-Yeong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 4, 2012, Pages 859~866
DOI : 10.7465/jkdi.2012.23.4.859
Some switching properties of multivariate control charts are investigated when the interval between two consecutive sample selections is not fixed but changes according to the result of the previous sample observation. Many articles showed that the performances of variable sampling interval control charts are more efficient than those of fixed sampling interval control charts in terms of average run length (ARL) and average time to signal (ATS). Unfortunately, the ARL and the ATS do not provide any information on how frequent a switch is being made. We evaluate several switching properties of two sampling interval Shewhart and CUSUM procedures for controlling mean vector of correlated quality variables.