Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 18, Issue 4 - Nov 2007
Volume 18, Issue 3 - Aug 2007
Volume 18, Issue 2 - Apr 2007
Volume 18, Issue 1 - Feb 2007
Selecting the target year
Application of Statistical Methods in Quantitative Linguistics Study
Choi, Kyung-Ho ; Hwang, Yong-Joo ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 269~278
Nowadays, from the study of quantitative linguistics, the application of quantitative method is located in a variety of fields as a necessary method. According to this phenomenon, the knowledge of statistical method is requisite for linguists. However, unfortunately, there still remain difficulties for them to acquire the statistical knowledge. So, it is needed for linguists to be helped by statisticians and their active roles. Accordingly, this study is going to emphasizing that statisticians should have more interests in the field of quantitative linguistics. Moreover, it will prove that by using statistical methods, analysis on the linguistic research becomes more objective and scientific.
Association Rule Mining by Environmental Data Fusion
Cho, Kwang-Hyun ; Park, Hee-Chang ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 279~287
Data fusion is the process of combining multiple data in order to produce information of tactical value to the user. Data fusion is generally defined as the use of techniques that combine data from multiple sources and gather that information in order to achieve inferences. Data fusion is also called data combination or data matching. Data fusion is divided in five branch types which are exact matching, judgemental matching, probability matching, statistical matching, and data linking. In this paper, we develop was macro program for statistical matching which is one of five branch types for data fusion. And then we apply data fusion and association rule techniques to environmental data.
Estimation of VaR in Stock Return Using Change Point
Lee, Seung-S. ; Jo, Ju-H. ; Chung, Sung-S. ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 289~300
The stock return is changed by factors of inside and outside or is changed by factor of market system. But most studies have not considered the changes of stock return distribution when estimate the VaR. Such study may lead us to wrong conclusion. In this paper we calculate the VaR of price-to-earnings ratios by the distribution that have considered the change point and used transformation to satisfy normal distribution.
Separate Fuzzy Regression with Crisp Input and Fuzzy Output
Yoon, Jin-Hee ; Choi, Seung-Hoe ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 301~314
The aim of this paper is to deal with a method to construct a separate fuzzy regression model with crisp input and fuzzy output data using a best response function for the center and the width of the predicted output. Also we introduce the crisp mean and variance of the predicted fuzzy value and also give some examples to compare a performance of the proposed fuzzy model with various other fuzzy regression model.
Design and Evaluation of a Dynamic Anomaly Detection Scheme Considering the Age of User Profiles
Lee, Hwa-Ju ; Bae, Ihn-Han ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 315~326
The rapid proliferation of wireless networks and mobile computing applications has changed the landscape of network security. Anomaly detection is a pattern recognition task whose goal is to report the occurrence of abnormal or unknown behavior in a given system being monitored. This paper presents a dynamic anomaly detection scheme that can effectively identify a group of especially harmful internal masqueraders in cellular mobile networks. Our scheme uses the trace data of wireless application layer by a user as feature value. Based on the feature values, the use pattern of a mobile's user can be captured by rough sets, and the abnormal behavior of the mobile can be also detected effectively by applying a roughness membership function with both the age of the user profile and weighted feature values. The performance of our scheme is evaluated by a simulation. Simulation results demonstrate that the anomalies are well detected by the proposed dynamic scheme that considers the age of user profiles.
Shrinkage Structure of Ridge Partial Least Squares Regression
Kim, Jong-Duk ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 327~344
Ridge partial least squares regression (RPLS) is a regression method which can be obtained by combining ridge regression and partial least squares regression and is intended to provide better predictive ability and less sensitive to overfitting. In this paper, explicit expressions for the shrinkage factor of RPLS are developed. The structure of the shrinkage factor is explored and compared with those of other biased regression methods, such as ridge regression, principal component regression, ridge principal component regression, and partial least squares regression using a near infrared data set.
A CUSUM Algorithm for Early Detection of Structural Changes in Won/Dollar Exchange Market
Song, Gyu-Moon ; Park, Byung-Chun ; Kang, Hoon-Kyu ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 345~356
This study deals with an early detection problem of structural change in won/dollar exchange market. A CUSUM algorithm is developed to monitor relevant economic variables indicating structural change in won/dollar exchange market. We applied the CUSUM algorithm to examine whether or not it was possible to alarm the 1997 economic crisis of Korea in advance.
Conjoint Analysis for the Preferred Subjects of Elementary School Computer Education
Hur, Ji-Sun ; Pak, Ro-Jin ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 357~364
This article has tried to identify the preferred subjects of after school computer educations in elementary schools by means of the conjoint analysis. We surveyed the fourth, fifth and sixth grade students from the three schools in Seoul. It has been found that graphic related courses are most preferred, though such courses are taught in public schools. Based on this research, we propose a new curriculum for after school computer education.
On Sample Size Determination of Bioequivalence Trials
Park, Sang-Gue ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 365~373
Sample size determination plays an important role in designing a bioequivalence trial. Formulae of sample sizes based on Schuirmann's two one-sided tests procedures are given for bioequivalence studies with the
crossover design and two-sample parallel design. A practical discussion for the relationship among these formulae is given.
Development of a Dynamic Geometry Environment to Collect Learning History Data
Mun, Kill-Sung ; Han, Beom-Soo ; Han, Kyung-Soo ; Ahn, Jeong-Yong ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 375~384
As teachings that use the ICT are more popular, many studies on the dynamic geometry environment(DGE) are under way. An important factor emphasized in the studies is to practical use learning activities of learners. In this study, we first define the learning history data in DGE. Second we develop a prototype of the DGE that is able to collect and analyze the learning history data automatically. The environment enables not only to grasp leaning history but also to create and manage new learning objects.
Image Feature Detection and Contrast Enhancement Algorithms Based on Statistical Tests
Kim, Yeong-Hwa ; Nam, Ji-Ho ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 385~399
In many image processing applications, a random noise makes some trouble since most video enhancement functions produce visual artifacts if a priori of the noise is incorrect. The basic difficulty is that the noise and the signal are difficult to be distinguished. Typical unsharp masking (UM) enhances the visual appearances of images, but it also amplifies the noise components of the image. Hence, the applications of a UM are limited when noises are presented. This paper proposed statistical algorithms based on parametric and nonparametric tests to adaptively enhance the image feature and the noise combining while applying UM. With the proposed algorithm, it is made possible to enhance the local contrast of an image without amplifying the noise.
An ANP-Based Performance Model for ERP System's Implementation
Ko, Je-Suk ; Park, Soon-Hak ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 401~409
This paper addresses a performance evaluation model for ERP system's implementation using Analytic Network Process (ANP) technique. In this study, the performance variables are identified as the perspectives of cost, business process, systems operation, and change management, respectively. The empirical study also investigated factors that affect the performance variables to find out the causal relationship between them using the ANP approach. The data for the empirical analysis were collected from manufacturing companies that have implemented ERP systems. The research findings indicate the proposed model is powerful in proposing that the indirect relationship between influencing factors and managerial effectiveness, mediated by employee satisfaction, is an important one.
Sample Size Comparison for Non-Inferiority Trials
Kim, Dong-Wook ; Kim, Dong-Jae ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 411~418
Sample size calculation is very important in clinical trials. In this paper, we propose sample size calculation method for non-inferiority trials using sample size calculation method suggested by Wang et al.(2003) based on Wilcoxon's rank sum test. Also, sample size comparison between parametric method and proposed method are presented.
The Errors of Population Projections for Korea on Korean Information Statistical System
Yoon, Yong-Hwa ; Kim, Jong-Tae ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 419~427
Recently, Korean National Statistical Office submits the results of population projections for Korea from 1960 to 2050 year. The purpose of this paper is to suggest the reasonable assumptions for the survey of population, and then to detect the errors of the surveyed population (1960-2005) on Korean Information Statistical System.
A Fuzzy Differential Diagnosis of Headache
Kim, Young-Hyun ; Kim, Soon-Ki ; Oh, Sun-Young ; Ahn, Jeong-Yong ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 429~438
Headache is one of the most common reasons for neurological consultation. Headache as many causes and symptoms. Therefore, screening method using questionaire is helpful in diagnosis of headache. This paper is to propose a medical diagnostic method to grasp patient's diseases using the relations between symptoms and diseases. For this purpose, we develop an interview chart assigned IF(intuitionistic fuzzy) grade with the relation among symptoms and three labels of headache. The method can be used to classify patient's tone of diseases with certain degrees of belief and its concerned symptoms.
Analyses of Computation Time on Snakes and Gradient Vector Flow
Kwak, Young-Tae ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 439~445
GVF can solve two difficulties with Snakes that are on setting initial contour and have a hard time processing into boundary concavities. But GVF takes much longer computation time than the existing Snakes because of their edge map and partial derivatives. Therefore this paper analyzed the computation time between GVF and Snakes. As a simulation result, both algorithms took almost similar computation time in simple image. In real images, GVF took about two times computation than Snakes.
Development of Discriminant Analysis System by Graphical User Interface of Visual Basic
Lee, Yong-Kyun ; Shin, Young-Jae ; Cha, Kyung-Joon ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 447~456
Recently, the multivariate statistical analysis has been used to analyze meaningful information for various data. In this paper, we develope the multivariate statistical analysis system combined with Fisher discriminant analysis, logistic regression, neural network, and decision tree using visual basic 6.0.
Process Capability Analysis by a New Process Incapability Index
Kim, Hee-Jung ; Cho, Gyo-Young ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 457~469
Process Capability Indexes(PCI) are used as the measure for evaluation of process capability analysis and is the statistical method for efficient process control. The fourth generation
is constructed from
by introducing the factor
in the numerator as an extra penalty for the departure of the process mean from the preassigned target value T And Process Incapability Indexes(PII) are presented by inversing PCI and include the information of PCI. This paper introduces the PII
provide manager with various information of process and include Gage R&R. PII
is presented by inversing PCI
and include the information of PCI
A Proportional Odds Mixed - Effects Model for Ordinal Data
Choi, Jae-Sung ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 471~479
This paper discusses about how to build up mixed-effects model for analysing ordinal response data by using cumulative logits. Random factors are assumed to be coming from the designed sampling scheme for choosing observational units. Since the observed responses of individuals are ordinal, a proportional odds model with two random effects is suggested. Estimation procedure for the unknown parameters in a suggested model is also discussed by an illustrated example.
Initial Mode Decision Method for Clustering in Categorical Data
Yang, Soon-Cheol ; Kang, Hyung-Chang ; Kim, Chul-Soo ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 481~488
The k-means algorithm is well known for its efficiency in clustering large data sets. However, working only on numeric values prohibits it from being used to cluster real world data containing categorical values. The k-modes algorithm is to extend the k-means paradigm to categorical domains. The algorithm requires a pre-setting or random selection of initial points (modes) of the clusters. This paper improved the problem of k-modes algorithm, using the Max-Min method that is a kind of methods to decide initial values in k-means algorithm. we introduce new similarity measures to deal with using the categorical data for clustering. We show that the mushroom data sets and soybean data sets tested with the proposed algorithm has shown a good performance for the two aspects(accuracy, run time).
Introduction to Gene Prediction Using HMM Algorithm
Kim, Keon-Kyun ; Park, Eun-Sik ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 489~506
Gene structure prediction, which is to predict protein coding regions in a given nucleotide sequence, is the most important process in annotating genes and greatly affects gene analysis and genome annotation. As eukaryotic genes have more complicated structures in DNA sequences than those of prokaryotic genes, analysis programs for eukaryotic gene structure prediction have more diverse and more complicated computational models. There are Ab Initio method, Similarity-based method, and Ensemble method for gene prediction method for eukaryotic genes. Each Method use various algorithms. This paper introduce how to predict genes using HMM(Hidden Markov Model) algorithm and present the process of gene prediction with well-known gene prediction programs.
Unified Estimates for Parameter Changes in a Pareto Model with an Exponential Outlier
Ryu, Se-Gi ; Lee, Chang-Soo ; Chang, Chu-Seock ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 507~514
We shall propose several estimators for the scale parameter in the Pareto distribution with an unidentified exponential outlier when the scale parameter is functions of a known exposure level, and obtain expectations and variances for their proposed estimators. And we shall compare numerically efficiencies for proposed estimators of the scale and shape parameters in the small sample sizes.
Fuzzy Local Linear Regression Analysis
Hong, Dug-Hun ; Kim, Jong-Tae ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 515~524
This paper deals with local linear estimation of fuzzy regression models based on Diamond(1998) as a new class of non-linear fuzzy regression. The purpose of this paper is to introduce a use of smoothing in testing for lack of fit of parametric fuzzy regression models.
Statistically Proper Multiple Range Tests for a Within Subject Factor in a Repeated Measures Design
Park, Cheol-Yong ; Park, Sang-Bum ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 525~534
It is a common practice in many research areas that multiple range tests for a between subject factor such as Tukey are applied to a within subject factor in a repeated measures design. Tukey procedure, however, sometimes detects no pairs with different means even when the hypothesis of all equal level means is rejected. This study attempts to provide a rationale for the proposition that Tukey is inappropriate post hoc procedure for a within subject factor in which the observations are correlated. We introduce two multiple range tests, Bonferroni and Scheffe, for a within subject factor and show that Bonferroni is more appropriate than Scheffe for pairwise multiple comparisons. Subsequent simulation study indicates that Tukey has significantly less power than Bonferroni in detecting actual difference between means of some pairs when the observations of a within subject factor are highly correlated.
Prediction of MTBF Using the Modulated Power Law Process
Na, Myung-Hwan ; Son, Young-Sook ; Yoon, Sang-Hoo ; Kim, Moon-Ju ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 535~541
The Non-homogeneous Poisson process is probably the most popular model since it can model systems that are deteriorating or improving. The renewal process is a model that is often used to describe the random occurrence of events in time. But both these models are based on too restrictive assumptions on the effect of the repair action. The Modulated Power Law Process is a suitable model for describing the failure pattern of repairable systems when both renewal-type behavior and time trend are present. In this paper we propose maximum likelihood estimation of the next failure time after the system has experienced some failures, that is, Mean Time Between Failure for the MPLP model.
Reliability In a Half-Triangle Distribution and a Skew-Symmetric Distribution
Woo, Jung-Soo ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 543~552
We consider estimation of the right-tail probability in a half-triangle distribution, and also consider inference on reliability, and derive the k-th moment of ratio of two independent half-triangle distributions with different supports. As we define a skew-symmetric random variable from a symmetric triangle distribution about origin, we derive its k-th moment.
Multiple Comparisons for a Bivariate Exponential Populations Based On Dirichlet Process Priors
Cho, Jang-Sik ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 553~560
In this paper, we consider two components system which lifetimes have Freund's bivariate exponential model with equal failure rates. We propose Bayesian multiple comparisons procedure for the failure rates of I Freund's bivariate exponential populations based on Dirichlet process priors(DPP). The family of DPP is applied in the form of baseline prior and likelihood combination to provide the comparisons. Computation of the posterior probabilities of all possible hypotheses are carried out through Markov Chain Monte Carlo(MCMC) method, namely, Gibbs sampling, due to the intractability of analytic evaluation. The whole process of multiple comparisons problem for the failure rates of bivariate exponential populations is illustrated through a numerical example.
A Computationally Efficient Optimal Allocation Algorithms for Large Data
Kwon, Il-Hyung ; Kim, Ju-Sung ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 561~572
In this paper, we describe various efficient optimization algorithms for obtaining an optimal customer allocation in the telephone call center. The main advantages of the proposed algorithms are simple, fast and very attractive for massive dataset. The proposed algorithms also provide comparable performance with the other more sophisticated linear programming methods. The proposed optimal allocation algorithms increase the customer contact, response rate and management product and optimize the performance of call centers. Simulation results are given to demonstrate the effectiveness of our algorithms.
The Approximate MLE in a Skew-Symmetric Laplace Distribution
Son, Hee-Ju ; Woo, Jung-Soo ;
Journal of the Korean Data and Information Science Society, volume 18, issue 2, 2007, Pages 573~584
We define a skew-symmetric Laplace distribution by a symmetric Laplace distribution and evaluate its coefficient of skewness. And we derive an approximate maximum likelihood estimator(AME) and a moment estimator(MME) of a skewed parameter in a skew-symmetric Laplace distribution, and hence compare simulated mean squared errors of those estimators. We compare asymptotic mean squared errors of two defined estimators of reliability in two independent skew-symmetric distributions.