Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 21, Issue 6 - Nov 2010
Volume 21, Issue 5 - Sep 2010
Volume 21, Issue 4 - Jul 2010
Volume 21, Issue 3 - May 2010
Volume 21, Issue 2 - Mar 2010
Volume 21, Issue 1 - Jan 2010
Selecting the target year
The relative risk of major risk factors of ischemic heart disease
Ko, Min-Jung ; Han, Jun-Tae ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 201~209
Due to the dramatic increase of mortality from ischemic heart disease (IHD) during the last decade, it is highly warranted to present the effective prevention strategy. Therefore this study identified the major risk factors of IHD over 10 years of follow-up among 2,268,018 participants of National Health Insurance Exam in 1996 with Cox proportional hazard model. In men, BMI, blood pressure, smoking were significantly associated with IHD, whereas hypertension, perceived health status and
-GTP were related with IHD in women.
Alternative optimization procedure for parameter design using neural network without SN
Na, Myung-Whan ; Kwon, Yong-Man ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 211~218
Taguchi has used the signal-to-noise ratio (SN) to achieve the appropriate set of operating conditions where variability around target is low in the Taguchi parameter design. Many Statisticians criticize the Taguchi techniques of analysis, particularly those based on the SN. Moreover, there are difficulties in practical application, such as complexity and nonlinear relationships among quality characteristics and design (control) factors, and interactions occurred among control factors. Neural networks have a learning capability and model free characteristics. There characteristics support neural networks as a competitive tool in processing multivariable input-output implementation. In this paper we propose a substantially simpler optimization procedure for parameter design using neural network without resorting to SN. An example is illustrated to compare the difference between the Taguchi method and neural network method.
Distribution fitting for the rate of return and value at risk
Hong, Chong-Sun ; Kwon, Tae-Wan ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 219~229
There have been many researches on the risk management due to rapid increase of various risk factors for financial assets. Aa a method for comprehensive risk management, Value at Risk (VaR) is developed. For estimation of VaR, it is important task to solve the problem of asymmetric distribution of the return rate with heavy tail. Most real distributions of the return rate have high positive kurtosis and low negative skewness. In this paper, some alternative distributions are used to be fitted to real distributions of the return rate of financial asset. And estimates of VaR obtained by using these fitting distributions are compared with those obtained from real distribution. It is found that normal mixture distribution is the most fitted where its skewness and kurtosis of practical distribution are close to real ones, and the VaR estimation using normal mixture distribution is more accurate than any others using other distributions including normal distribution.
Rank transformation analysis for 4
4 balanced incomplete block design
Choi, Young-Hun ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 231~240
If only fixed effects exist in a 4
4 balanced incomplete block design, powers of FR statistic for testing a main effect show the highest level with a few replications. Under the exponential and double exponential distributions, FR statistic shows relatively high powers with big differences as compared with the F statistic. Further in a traditional balanced incomplete block design, powers of FR statistic having a fixed main effect and a random block effect show superior preference for all situations without regard to the effect size of a main effect, the parameter size and the type of population distributions of a block effect. Powers of FR statistic increase in a high speed as replications increase. Overall power preference of FR statistic for testing a main effect is caused by unique characteristic of a balanced incomplete block design having one main and block effect with missing observations, which sensitively responds to small increase of main effect and sample size.
Modeling on asymmetric circular data using wrapped skew-normal mixture
Na, Jong-Hwa ; Jang, Young-Mi ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 241~250
Over the past few decades, several studies have been made on the modeling of circular data. But these studies focused mainly on the symmetrical cases including von Mises distribution. Recently, many studies with skew-normal distribution have been conducted in the linear case. In this paper, we dealt the problem of fitting of non-symmetrical circular data with wrapped skew-normal distribution which can be derived by using the principle of wrapping. Wrapped skew-normal distribution is very flexible to asymmetical data as well as to symmetrical data. Multi-modal data are also fitted by using the mixture of wrapped skew-normal distributions. To estimate the parameters of mixture, we suggested the EM algorithm. Finally we verified the accuracy of the suggested algorithm through simulation studies. Application with real data is also considered.
Data driven approach for information system adoption: Applied in CRM case
Park, Jong-Han ; Lee, Seok-Kee ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 251~262
While outsourcing has become a basic strategy of the information system adoption, there is an emerging needs to analyze the gap between the required data and the existing data for the new system from an adopting company's perspective. In CRM adoption failure cases, the first reason is adopting company pay no attention to the data that will support investment and systems. So far, there is no attempt to consider data driven approach in information system adoption field. Hence, we propose Information System Adoption Model based on Data (ISAMD) and show how to use in real world by simulation. By using ISAMD, information system adoption decision maker can simulate the needed data and related cost with various information system alternatives in short term, and long term planning. ISAMD can prevent the possible threat of unexpected data cost in adopting new system at the adopting decision stage.
Decision process for right association rule generation
Park, Hee-Chang ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 263~270
Data mining is the process of sorting through large amounts of data and picking out useful information. An important goal of data mining is to discover, define and determine the relationship between several variables. Association rule mining is an important research topic in data mining. An association rule technique finds the relation among each items in massive volume database. Association rule technique consists of two steps: finding frequent itemsets and then extracting interesting rules from the frequent itemsets. Some interestingness measures have been developed in association rule mining. Interestingness measures are useful in that it shows the causes for pruning uninteresting rules statistically or logically. This paper explores some problems for two interestingness measures, confidence and net confidence, and then propose a decision process for right association rule generation using these interestingness measures.
Two stage Chang's randomized response technique
Choi, Kyoung-Ho ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 271~277
The randomized response technique is an indirect question that employs a randomizing device to protect respondents' privacy. The technique is now considered the most efficient of the newly developed techniques. In this technique, Chang et al. (2004) suggests an improved forced-answer technique and finds more efficient conditions than Warner did in 1965. But it is the weakness of the technique to lose more information than a direct response technique does. Therefore, a lot of researches have developed new techniques to reduce loss of information, to enhance estimated efficiency, and to efficiently use collected information. Considering this tendency, this paper also tries to improve Chang's technique. It suggests the technique that is extended from Chang's and finds more efficient conditions than Chang's technique and Mangat and Singh's (1990) did.
A simple statistical model for determining the admission or discharge of dyspnea patients
Park, Cheol-Yong ; Kim, Tae-Yoon ; Kwon, O-Jin ; Park, Hyoung-Seob ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 279~289
In this study, we propose a simple statistical model for determining the admission or discharge of 668 patients with a chief complaint of dyspnea. For this, we use 11 explanatory variables which are chosen to be important by clinical experts among 55 variables. As a modification process, we determine the discharge interval of each variable by the kernel density functions of the admitted and discharged patients. We then choose the optimal model for determining the discharge of patients based on the number of explanatory variables belonging to the corresponding discharge intervals. Since the numbers of the admitted and discharged patients are not balanced, we use, as the criteria for selecting the optimal model, the arithmetic mean of sensitivity and specificity and the harmonic mean of sensitivity and precision. The selected optimal model predicts the discharge if 7 or more explanatory variables belong to the corresponding discharge intervals.
Some versatile tests based on percentile tests
Park, Hyo-Il ; Kim, Ju-Sung ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 291~296
In this paper, we consider a versatile test based on percentile tests. The versatile test may be useful when the underlying distributions are unknown or quite different types. We consider two kinds of combining functions for the percentile statistics, the quadratic and summing forms and obtain the limiting distributions under the null hypothesis. Then we illustrate our procedure with an example. Finally we discuss some interesting features of the test as concluding remarks.
Default Bayesian testing for normal mean with known coefficient of variation
Kang, Sang-Gil ; Kim, Dal-Ho ; Le, Woo-Dong ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 297~308
This article deals with the problem of testing mean when the coefficient of variation in normal distribution is known. We propose Bayesian hypothesis testing procedures for the normal mean under the noninformative prior. The noninformative prior is usually improper which yields a calibration problem that makes the Bayes factor to be defined up to a multiplicative constant. So we propose the objective Bayesian hypothesis testing procedures based on the fractional Bayes factor and the intrinsic Bayes factor under the reference prior. Specially, we develop intrinsic priors which give asymptotically same Bayes factor with the intrinsic Bayes factor under the reference prior. Simulation study and a real data example are provided.
Support vector quantile regression for longitudinal data
Hwang, Chang-Ha ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 309~316
Support vector quantile regression (SVQR) is capable of providing more complete description of the linear and nonlinear relationships among response and input variables. In this paper we propose a weighted SVQR for the longitudinal data. Furthermore, we introduce the generalized approximate cross validation function to select the hyperparameters which affect the performance of SVQR. Experimental results are the presented, which illustrate the performance of the proposed SVQR.
Goodness-of-fit test for the half logistic distribution based on multiply Type-II censored samples
Kang, Suk-Bok ; Cho, Young-Seuk ; Han, Jun-Tae ; SaKong, Jin ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 317~325
In this paper, we develop four modified empirical distribution function (EDF) type tests using approximate maximum likelihood estimators for the half-logistic distribution based on multiply Type-II censored samples. We also propose modified normalize sample Lorenz curve polt and new test statistics. We compare the above test statistics in the sense of the power for various censored samples. We present an example to illustrate this method.
The significance of proxies for agency costs under different governance approaches
Shin, Yang-Gyu ; Reddy, Krishna ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 327~333
This study examines the impact different proxies of agency costs have on companies under different governance approaches. The two specific proxies of agency costs used include: (i) the ratio of operating expenses to annual sales; and (ii) the ratio of annual sales to total assets. Our study is based on earlier works of Ang et al. (2000) and Fleming et al. (2005). A comparison of results for small unlisted companies both in US and Australia indicate that agency cost measures have statistically: (1) different result under rule-based governance mechanisms; and (2) the same results under principle-based governance mechanisms. Our findings support the view that the effectiveness different measures of agency cost is dependent on country specific governance facto as well as on the governance approaches adopted. Our results offer insights to both practitioners and policy makers regarding the usefulness of different proxies of agency costs when companies adopt principle-based corporate governance approaches versus rule-based approaches.
Noninformative priors for the common scale parameter in Pareto distributions
Kang, Sang-Gil ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 335~343
In this paper, we develop the reference priors for the common scale parameter in the nonregular Pareto distributions with unequal shape paramters. We derive the reference priors as noninformative prior and prove the propriety of joint posterior distribution under the general prior including the reference priors. Through the simulation study, we show that the proposed reference priors match the target coverage probabilities in a frequentist sense.
Likelihood ratio in estimating gamma distribution parameters
Rahman, Mezbahur ; Muraduzzaman, S. M. ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 345~354
The Gamma Distribution is widely used in Engineering and Industrial applications. Estimation of parameters is revisited in the two-parameter Gamma distribution. The parameters are estimated by minimizing the likelihood ratios. A comparative study between the method of moments, the maximum likelihood method, the method of product spacings, and minimization of three different likelihood ratios is performed using simulation. For the scale parameter, the maximum likelihood estimate performs better and for the shape parameter, the product spacings estimate performs better. Among the three likelihood ratio statistics considered, the Anderson-Darling statistic has inferior performance compared to the Cramer-von-Misses statistic and the Kolmogorov-Smirnov statistic.
An empirical study on the material distribution decision making
Ko, Je-Suk ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 355~361
This paper addresses a mathematical approach to decision making in a real-world material distribution situation. The problem is characterized by a low-volume and highly-varied mix of products, therefore there is a lot of material movement between the facilities. This study focuses especially on the transportation scheduler with a tool that can be used to quantitatively analyze the volume of material moved, the type of truck to be used, production schedules, and due dates. In this research, we have developed a mixed integer programming problem using the minimum cost, multiperiod, multi-commodity network flow approach that minimizes the overall material movement costs. The results suggest that the optimization approach provides a set of feasible solution routes with the objective of reducing the overall fleet cost.
Mixed-effects LS-SVR for longitudinal dat
Cho, Dae-Hyeon ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 363~369
In this paper we propose a mixed-effects least squares support vector regression (LS-SVR) for longitudinal data. We add a random-effect term in the optimization function of LS-SVR to take random effects into LS-SVR for analyzing longitudinal data. We also present the model selection method that employs generalized cross validation function for choosing the hyper-parameters which affect the performance of the mixed-effects LS-SVR. A simulated example is provided to indicate the usefulness of mixed-effect method for analyzing longitudinal data.
Simple hypotheses testing for the number of trees in a random forest
Park, Cheol-Yong ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 371~377
In this study, we propose two informal hypothesis tests which may be useful in determining the number of trees in a random forest for use in classification. The first test declares that a case is 'easy' if the hypothesis of the equality of probabilities of two most popular classes is rejected. The second test declares that a case is 'hard' if the hypothesis that the relative difference or the margin of victory between the probabilities of two most popular classes is greater than or equal to some small number, say 0.05, is rejected. We propose to continue generating trees until all (or all but a small fraction) of the training cases are declared easy or hard. The advantage of combining the second test along with the first test is that the number of trees required to stop becomes much smaller than the first test only, where all (or all but a small fraction) of the training cases should be declared easy.
Semiparametric Bayesian estimation under functional measurement error model
Hwang, Jin-Seub ; Kim, Dal-Ho ;
Journal of the Korean Data and Information Science Society, volume 21, issue 2, 2010, Pages 379~385
This paper considers Bayesian approach to modeling a flexible regression function under functional measurement error model. The regression function is modeled based on semiparametric regression with penalized splines. Model fitting and parameter estimation are carried out in a hierarchical Bayesian framework using Markov chain Monte Carlo methodology. Their performances are compared with those of the estimators under functional measurement error model without semiparametric component.