Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 14, Issue 4 - Nov 2003
Volume 14, Issue 3 - Aug 2003
Volume 14, Issue 2 - May 2003
Volume 14, Issue 1 - Feb 2003
Selecting the target year
Comparative Study on Imputation Procedures in Exponential Regression Model with missing values
Park, Young-Sool ; Kim, Soon-Kwi ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 143~152
A data set having missing observations is often completed by using imputed values. In this paper, performances and accuracy of five imputation procedures are evaluated when missing values exist only on the response variable in the exponential regression model. Our simulation results show that adjusted exponential regression imputation procedure can be well used to compensate for missing data, in particular, compared to other imputation procedures. An illustrative example using real data is provided.
Test of the Hypothesis based on Nonlinear Regression Quantiles Estimators
Choi, Seung-Hoe ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 153~165
This paper considers the likelihood ratio test statistic based on nonlinear regression quantiles estimators in order to test of hypothesis about the regression parameter
and derives asymptotic distribution of proposed test statistic under the null hypothesis and a sequence of local alternative hypothesis. The paper also investigates asymptotic relative efficiency of the proposed test to the test based on the least squares estimators or the least absolute deviation estimators and gives some examples to illustrate the application of the main result.
Korean Document Classification using Characteristics of Word Information
Kim, Seok-Ki ; Han, Kyung-Soo ; Ahn, Jeong-Yong ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 167~175
In document classification, target of analysis is not document itself but words appeared in the document. Word information, therefore, is a significant factor in document classification. In this study, we are dealing with the classification of Korean document based on words and feature vectors. First, we present the performance of document classification using nouns and keywords. Second, we compare to the results for the size of feature vectors.
An application to Multivariate Zero-Inflated Poisson Regression Model
Kim, Kyung-Moo ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 177~186
The Zero-Inflated Poisson regression is a model for count data with exess zeros. When the correlated response variables are intrested, we have to extend the univariate zero-inflated regression model to multivariate model. In this paper, we study and simulate the multivariate zero-inflated regression model. A real example was applied to this model. Regression parameters are estimated by using MLE's. We also compare the fitness of multivariate zero-inflated Poisson regression model with the decision tree model.
Tests for Mean Change with the Modified Cusum Statistics
Kim, Jae-Hee ; Kim, Na-Yeon ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 187~199
We deal with the problem of testing a sequence of independent normal random variables with constant, known or unknown, variance for no change in mean versus alternatives with a single change-point. Various tests based on the likelihood ratio and recursive residuals, score statistics and cusums are studied. Proposed tests are modified version of Buckley's cusum statistics. A comparison study of various change-point test statistics is done by Monte Carlo simulation with S-plus software.
Study on Optimum Sizes of Experimental Units
Chang, Suk-Hwan ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 201~208
Since no information about the optimum plot sizes on field experiments on the major food crops in Korea is available, present status of plot sizes being used by the research institutes was examined for rice, barley & wheet, soybean, potatoes, red pepper, garlic and onion. The optimum plot sizes in field experiments on these crops were estimated on the basis of soil fertility indices (Smith's regression coefficients) that Chang (1983) reported.
Semiparametric Evaluation of Environmental Goods: Local Linear Model Approach
Jeong, Ki-Ho ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 209~216
Contingent valuation method (CVM) is a main evaluation method of nonmarket goods for which markets either do not exist at all or do exist only incompletely; an example is environmental good. A dichotomous choice approach, the most popular type of CVM in environmental economics, employs binary discrete choice models as statistical estimation models. In this paper, we propose a semiparametric dichotomous choice CVM method using local linear model of Fan and Gijbels (1996) in which probability distribution of error term is specified parametrically but latent structural function is specified nonparametrically. The computation procedures of the proposed method are illustrated with a simple design of simulations.
A Study of the Reliability of Web Services using Client Sides Errors
Lee, Sang-Bock ; Kim, Mal-Suk ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 217~221
Modeling the reliability of distributed systems requires a good understanding the reliability of the components. For thousands of web users, competitiveness in web services means a successful presence on the web. Failure rates for the presence of a web site are considered on client sides errors using RFC2068. Data were collected from some host via the internet.
Web Learning Guidance for Elementary School Students
Kim, Hae-Gue ; Oh, Kwang-Sik ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 223~235
Using internet, most of students are still exposed to unreasonable commercial information and even tend to consumptive behaviors. Various programs have been mobilized to keep it with a blockade system against the noxious information. But guidance is more instructive than blockade in respect of education. Thus, the focus of this study is to induce and motivate their self-directed learning activities with internet guide contents. We develope a learning guidance material as one of the information platforms. Furthermore, we consider the availability of such learning guidance materials through interview and observation. We find that easy and meaningful internet access and utilizing environment influence the children's self-directed learning ability.
Scaling MDS for Preference Data Using Target Configuration
Hwang, S.Y. ; Park, S.K. ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 237~245
MDS(multi-dimensional scaling) for preference data is a graphical tool which usually figures out how consumers recognize, evaluate certain products. This article is mainly concerned with an optimal scaling for MDS when target configuration is available. Rotation of axis and SUR(seemingly unrelated regression) methods are employed to get a new configuration which is obtained as close to the target as we can. Methodologies developed here are also illustrated via a real data set.
On the Equality of Two Distributions Based on Nonparametric Kernel Density Estimator
Kim, Dae-Hak ; Oh, Kwang-Sik ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 247~255
Hypothesis testing for the equality of two distributions were considered. Nonparametric kernel density estimates were used for testing equality of distributions. Cross-validatory choice of bandwidth was used in the kernel density estimation. Sampling distribution of considered test statistic were developed by resampling method, called the bootstrap. Small sample Monte Carlo simulation were conducted. Empirical power of considered tests were compared for variety distributions.
A Study for the Features of Data Analysis Methods Used in Medical Research
Sin, Jae-Gyeong ; Jang, Deok-Jun ; Mun, Seung-Ho ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 257~264
The perception of the importance of statistical methods for processing medical data in Korea's medical research and the practical use of the analysis method are insufficient. From this standpoint, in order to examine the features of the data analysis method used in the medical journals of Korea and America, we have examined the research papers which has been published in the exemplary medical journals of both countries. It showed that there was a large difference in the quantity and quality between Korea and America. Especially in the medical research of Korea, we could notice that the use of statistical methods were comparatively low. Hence the researchers in the medical area are encouraged to use more statistical methods in processing medical data.
Comparison of prediction methods for Nonlinear Time series data with Intervention1)
Lee, Sung-Duck ; Kim, Ju-Sung ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 265~274
Time series data are influenced by the external events such as holiday, strike, oil shock, and political change, so the external events cause a sudden change to the time series data. We regard the observation as outlier that occurred as a result of external events. In general, it is called intervention if we know the period and the reason of external events, and it makes an analyst difficult to establish a time series model. Therefore, it is important that we analyze the styles and effects of intervention. In this paper, we considered the linear time series model with invention and compared with nonlinear time series models such as ARCH, GARCH model and also we compared with the combination prediction method that Tong(1990) introduced. In the practical case study, we compared prediction power with RMSE among linear, nonlinear time series model with intervention and combination prediction method.
Combined Procedure of Direct Question and Randomized Response Technique
Choi, Kyoung-Ho ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 275~278
In this paper, a simple and obvious procedure is presented that allows to estimate
, the population proportion of a sensitive group. Suggested procedure is combined procedure of direct question and randomized response technique. It is found that the proposed procedure is more efficient than Warner's(1965).
A Study on Properties of the survival function Estimators with Weibull approximation
Lee, Jae-Man ; Cha, Young-Joon ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 279~287
In this paper we propose a local smoothing of the Nelson type estimator for the survival function based on an approximation by the Weibull distribution function. It appears that Mean Square Error and Bias of the smoothed estimator of the Nelson type survival function estimators are significantly smaller than that of the smoothed estimator of the Kaplan-Meier survival function estimator.
Multivariate Control Charts for Autocorrelated Process
Cho, Gyo-Young ; Park, Mi-Ra ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 289~301
In this paper, we propose Shewhart control chart and EWMA control chart using the autocorrelated data which are common in chemical and process industries and lead to increase the number of false alarms when conventional control charts are applied. The effect of autocorrelated data is modeled as a autoregressive process, and canonical analysis is used to reduce the dimensionality of the data set and find the canonical variables that explain as much of the data variation as possible. Charting statistics are constructed based on the residual vectors from the canonical variables which are uncorrelated over time, and the control charts for these statistics can attenuate the autocorrelation in the process data. The charting procedures are illustrated with a numerical example and simulation is conducted to investigate the performances of the proposed control charts.
A Nonparametric Goodness-of-Fit Test for Sparse Multinomial Data
Baek, Jang-Sun ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 303~311
We consider the problem of testing cell probabilities in sparse multinomial data. Aerts, et al.(2000) presented
as a test statistic with the local polynomial estimator
, and showed its asymptotic distribution. When there are cell probabilities with relatively much different sizes, the same contribution of the difference between the estimator and the hypothetical probability at each cell in their test statistic would not be proper to measure the total goodness-of-fit. We consider a Pearson type of goodness-of-fit test statistic,
instead, and show it follows an asymptotic normal distribution.
Outlier Detection in Growth Curve Model
Shim, Kyu-Bark ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 313~323
For the growth curve model with arbitrary covariance structure, known as unstructured covariance matrix, the problems of detecting outliers are discussed in this paper. In order to detect outliers in the growth curve model, the test statistics using U-distribution is established. After detecting outliers in growth curve model, we test homo and/or hetero-geneous covariance matrices using PSR Quasi-Bayes Criterion. For illustration, one numerical example is discussed, which compares between before and after outlier deleting.
Multiple Response Optimization for Robust Design using Desirability Function
Kwon, Yong-Man ; Hong, Yeon-Woong ; Chang, Duk-Joon ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 325~335
Robust design is to identify appropriate settings of control factors that make the system's performance robust to to changes in the noise factors that represent the source of variation. In the Taguchi parameter design, the product array approach using orthogonal arrays is mainly used. However, it often requires an excessive number of experiments. An alternative approach, which is called the combined array approach, was suggested by Welch et. al. (1990) and studied by others. In these studies, only single response variable was considered. We propose how to simultaneously optimize multiple responses when we use the combined array approach.
Prediction Intervals for LS-SVM Regression using the Bootstrap
Shim, Joo-Yong ; Hwang, Chang-Ha ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 337~343
In this paper we present the prediction interval estimation method using bootstrap method for least squares support vector machine(LS-SVM) regression, which allows us to perform even nonlinear regression by constructing a linear regression function in a high dimensional feature space. The bootstrap method is applied to generate the bootstrap sample for estimation of the covariance of the regression parameters consisting of the optimal bias and Lagrange multipliers. Experimental results are then presented which indicate the performance of this algorithm.
Incremental Eigenspace Model Applied To Kernel Principal Component Analysis
Kim, Byung-Joo ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 345~354
An incremental kernel principal component analysis(IKPCA) is proposed for the nonlinear feature extraction from the data. The problem of batch kernel principal component analysis(KPCA) is that the computation becomes prohibitive when the data set is large. Another problem is that, in order to update the eigenvectors with another data, the whole eigenvectors should be recomputed. IKPCA overcomes this problem by incrementally updating the eigenspace model. IKPCA is more efficient in memory requirement than a batch KPCA and can be easily improved by re-learning the data. In our experiments we show that IKPCA is comparable in performance to a batch KPCA for the classification problem on nonlinear data set.
Unified Non-iterative Algorithm for Principal Component Regression, Partial Least Squares and Ordinary Least Squares
Kim, Jong-Duk ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 355~366
A unified procedure for principal component regression (PCR), partial least squares (PLS) and ordinary least squares (OLS) is proposed. The process gives solutions for PCR, PLS and OLS in a unified and non-iterative way. This enables us to see the interrelationships among the three regression coefficient vectors, and it is seen that the so-called E-matrix in the solution expression plays the key role in differentiating the methods. In addition to setting out the procedure, the paper also supplies a robust numerical algorithm for its implementation, which is used to show how the procedure performs on a real world data set.
Tests for the exponential distribution based on Type-II censored samples
Kang, Suk-Bok ; Cho, Young-Suk ; Choi, Sei-Yeon ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 367~376
Two explicit estimators of the scale parameter in an exponential distribution based on Type-II censored samples are proposed by appropriately approximating the likelihood function. Then two type tests, including the modified Cramer-von Mises test and Kolmogorov-Smirnov test are developed for the exponential distribution based on Type-II censored samples by using the proposed estimators. For each test, Monte Carlo techniques are used to generate critical values. The powers of these tests are investigated under several alternative distributions.
Large Sample Test for Independence in the Bivariate Pareto Model with Censored Data
Cho, Jang-Sik ; Lee, Jea-Man ; Lee, Woo-Dong ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 377~383
In this paper, we consider two components system in which the lifetimes follow the bivariate Pareto model with random censored data. We assume that the censoring time is independent of the lifetimes of the two components. We develop large sample tests for testing independence between two components. Also we present simulated study which is the test based on asymptotic normal distribution in testing independence.
Recurrence Formula for the Central Moments of Number of Successes with n Poisson Trials
Moon, Myung-Sang ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 385~391
A sequence of n Bernoulli trials which violates the constant success probability assumption is termed as "Poisson trials". In this paper, the recurrence formula for the r-th central moment of number of successes with n Poisson trials is derived. Romanovsky's method, based on the differentiation of characteristic function, is used in the derivation of recurrence formula for the central moments of conventional binomial distribution. Romanovsky's method is applied to that of Poisson trials in this paper. Some central moment calculation results are given to compare the central moments of Poisson trials with those of conventional binomial distribution.
On Multipurpose Replacement Policies for the General Failure Model
Cha, Ji-Hwan ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 393~403
In this paper, various replacement policies for the general failure model are considered. There are two types of failure in the general failure model. One is Type I failure (minor failure) which can be removed by a minimal repair and the other is Type II failure (catastrophic failure) which can be removed only by a complete repair. In this model, when the unit fails at its age t, Type I failure occurs with probability 1-p(t) and Type II failure occurs with probability p(t),
. Under the model, optimal replacement policies for the long-run average cost rate and the limiting efficiency are considered. Also taking the cost and the efficiency into consideration at the same time, the properties of the optimal policies under the Cost-Priority-Criterion and the Efficiency-Priority-Criterion are obtained.
Large Sample Tests for Independence and Symmetry in the Bivariate Weibull Model under Random Censorship
Cho, Jang-Sik ; Ko, Jeong-Hwan ; Kang, Sang-Kil ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 405~412
In this paper, we consider two components system which the lifetimes have a bivariate weibull distribution with random censored data. Here the censoring time is independent of the lifetimes of the components. We construct large sample tests for independence and symmetry between two-components based on maximum likelihood estimators and the natural estimators. Also we present a numerical study.
Statistical Inference Concerning Local Dependence between Two Multinomial Populations
Oh, Myong-Sik ;
Journal of the Korean Data and Information Science Society, volume 14, issue 2, 2003, Pages 413~428
If a restriction is imposed only to a (proper) subset of parameters of interest, we call it a local restriction. Statistical inference under a local restriction in multinomial setting is studied. The maximum likelihood estimation under a local restriction and likelihood ratio tests for and against a local restriction are discussed. A real data is analyzed for illustrative purpose.