Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 24, Issue 6 - Nov 2013
Volume 24, Issue 5 - Sep 2013
Volume 24, Issue 4 - Jul 2013
Volume 24, Issue 3 - May 2013
Volume 24, Issue 2 - Mar 2013
Volume 24, Issue 1 - Jan 2013
Selecting the target year
Exploratory data analysis for Korean daily exchange rate data with recurrence plots
Jang, Dae-Heung ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1103~1112
DOI : 10.7465/jkdi.2013.24.6.1103
Exploratory data analysis focuses mostly on data exploration instead of model fitting. We can use the recurrence plot as a graphical exploratory data analysis tool. With the recurrence plot, we can obtain the structural pattern of the time series and recognize the structural change points in time series at a glance.
Bankruptcy prediction using ensemble SVM model
Choi, Ha Na ; Lim, Dong Hoon ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1113~1125
DOI : 10.7465/jkdi.2013.24.6.1113
Corporate bankruptcy prediction has been an important topic in the accounting and finance field for a long time. Several data mining techniques have been used for bankruptcy prediction. However, there are many limits for application to real classification problem with a single model. This study proposes ensemble SVM (support vector machine) model which assembles different SVM models with each different kernel functions. Our ensemble model is made and evaluated by v-fold cross-validation approach. The k top performing models are recruited into the ensemble. The classification is then carried out using the majority voting opinion of the ensemble. In this paper, we investigate the performance of ensemble SVM classifier in terms of accuracy, error rate, sensitivity, specificity, ROC curve, and AUC to compare with single SVM classifiers based on financial ratios dataset and simulation dataset. The results confirmed the advantages of our method: It is robust while providing good performance.
Noise reduction by sigma filter applying orientations of feature in image
Kim, Yeong-Hwa ; Park, Youngho ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1127~1139
DOI : 10.7465/jkdi.2013.24.6.1127
In the realization of obtained image by various visual equipments, the addition of noise to the original image is a common phenomenon and the occurrence of the noise is practically impossible to prevent completely. Thus, the noise detection and reduction is an important foundational purpose. In this study, we detect the orientation about feature of images and estimate the level of noise variance based on the measurement of the relative proportion of the noise. Also, we apply the estimated level of noise to the sigma filter on noise reduction algorithm. And using the orientation about feature of images by weighted value, we propose the effective algorithm to eliminate noise. As a result, the proposed statistical noise reduction methodology provides significantly improved results over the usual sigma filtering and regardless of the estimated level of the noise variance.
An analysis of time series models for toilet and laundry water-uses
Myoung, Sungmin ; Kim, Donggeon ; Lee, Doo-Jin ; Kim, Hwa Soo ; Jo, Jinnam ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1141~1148
DOI : 10.7465/jkdi.2013.24.6.1141
End-uses of household water have been influenced by a housing type, life style and housing area which are considered as internal factors. Also, there are external factors such as water rate, weather and water supply facilities. Analysis of influential factors on water consumption in households would give an explanation on the cause of changing trends and would help predicting the water demand of end-use in household. In this paper, we used real data to predict toilet and laundry water-uses and utilized the linear regression model with autoregressive errors. The results showed that the monthly autoregressive error models explained about 71% for describing the water demand of end-use in toilet and laundry water-uses.
A study on academic achievements of college students admitted by admissions officer selection: K university case
Choi, Hyun Seok ; Park, Cheolyong ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1149~1157
DOI : 10.7465/jkdi.2013.24.6.1149
In this study we compare academic achievements of college students admitted by admissions officer selection with those admitted by general selection. Two measurements of the academic achievements considered are GPA (grade point average) and relative ascending rank of GPA. By the comparison of the academic achievements, we would like to assess the effectiveness of the admissions office selection and then provide a basis for screening good students by that selection. The results of data analysis indicate that the academic achievements of admissions officer selection students tend to be lower than those of early general admission students and also those of regular general admission students tend to be higher than those of early general admission students.
A review of life table applications and an introduction of its application method
Shin, Kyoungjin ; Choi, Boseung ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1159~1175
DOI : 10.7465/jkdi.2013.24.6.1159
The lift table summarized and calculated the life expectancy at each ages according to aging. In this study, we tried to re-classify and summarized the application of the life table because the life table can be applied to several research and industry area. We utilized the whole papers published in Korea and international until 2011 and we considered several classification standards based on application, base period, gender, and observation period. Each standard divides the life tables into two or three categories. The standard of application groups them into general and applied life tables. The standard of base period is divided into two parts: abridge life table using the unit of 5 years and complete life table of one year. According to gender, life tables are classified in male, female, and unisex life tables. According to complete life table, they are period life tables and cohort life tables. This study contributes to inform how life tables can be employed in many other areas by analyzing how the life tables are constructed.
The effect of university students` knowledge sharing on the educational performance
Choi, Hyun Seok ; Kim, Seul Gee ; Ha, Jeongcheol ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1177~1188
DOI : 10.7465/jkdi.2013.24.6.1177
Modern society is changing rapidly toward the intellectual one. The needs of knowledge sharing are highly demanded for its growth and progress through the society at large. Even though university creates the knowledge in various fields, university members could not share the knowledge effectively for some reasons. This K university students case study reveals that usage of IT, college sponsoring and communicative culture affect educational performance indirectly as well as directly through knowledge sharing.
Proposition of causal association rule thresholds
Park, Hee Chang ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1189~1197
DOI : 10.7465/jkdi.2013.24.6.1189
Data mining is the process of analyzing a huge database from different perspectives and summarizing it into useful information. One of the well-studied problems in data mining is association rule generation. Association rule mining finds the relationship among several items in massive volume database using the interestingness measures such as support, confidence, lift, etc. Typical applications for this technique include retail market basket analysis, item recommendation systems, cross-selling, customer relationship management, etc. But these interestingness measures cannot be used to establish a causality relationship between antecedent and consequent item sets. This paper propose causal association thresholds to compensate for this problem, and then check the three conditions of interestingness measures. The comparative studies with basic and causal association thresholds are shown by numerical example. The results show that causal association thresholds are better than basic association thresholds.
Generalized kernel estimating equation for panel estimation of small area unemployment rates
Shim, Jooyong ; Kim, Youngwon ; Hwang, Changha ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1199~1210
DOI : 10.7465/jkdi.2013.24.6.1199
The high unemployment rate is one of the major problems in most countries nowadays. Hence, the demand for small area labor statistics has rapidly increased over the past few years. However, since sample surveys for producing official statistics are mainly designed for large areas, it is difficult to produce reliable statistics at the small area level due to small sample sizes. Most of existing studies about the small area estimation are related with the estimation of parameters based on cross-sectional data. By the way, since many official statistics are repeatedly collected at a regular interval of time, for instance, monthly, quarterly, or yearly, we need an alternative model which can handle this type of panel data. In this paper, we derive the generalized kernel estimating equation which can model time-dependency among response variables and handle repeated measurement or panel data. We compare the proposed estimating equation with the generalized linear model and the generalized estimating equation through simulation, and apply it to estimating the unemployment rates of 25 areas in Gyeongsangnam-do and Ulsan for 2005.
Saddlepoint approximation for distribution function of sample mean of skew-normal distribution
Na, Jong-Hwa ; Yu, Hye-Kyung ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1211~1219
DOI : 10.7465/jkdi.2013.24.6.1211
Recently, the usage of skew-normal distribution, instead of classical normal distribution, is rising up in many statistical theories and applications. In this paper, we deal with saddlepoint approximation for the distribution function of sample mean of skew-normal distribution. Comparing to normal approximation, saddlepoint approximation provides very accurate results in small sample sizes as well as for large or moderate sample sizes. Saddlepoint approximations related to the skew-normal distribution, suggested in this paper, can be used as a approximate approach to the classical method of Gupta and Chen (2001) and Chen et al. (2004) which need very complicate calculations. Through simulation study, we verified the accuracy of the suggested approximation and applied the approximation to Robert`s (1966) twin data.
Study of interaction of teachers and family for behavior problems and social skills of children with autism
Kang, Minchae ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1221~1229
DOI : 10.7465/jkdi.2013.24.6.1221
The purpose of this study is to examine the impact of teacher and family interaction on behavior problems and social skills of children with autism spectrum disorder. Survey method was performed with 147 pair samples of a teacher and parents of a child with autism to analyze the influence toward behavior problems and social skills. Samples have been classified into three groups such as the high group in which there exists high interaction level in both teacher and family, middle group in which there exists high interaction level in either teacher or family and low group in which there exists low interaction level in both teacher and family. The results show that the correlation between teacher interaction and social skill of autism children is significant and the level of social skills of high interaction group is higher than that of the middle and low interaction group.
Development of brand equity index model and a strategy to improve brand equity: Focus on National Federation of Fisheries Cooperatives
Cho, Yong Jun ; Myoung, Soo Ah ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1231~1239
DOI : 10.7465/jkdi.2013.24.6.1231
Recently, successful management of the brand is more important than anything else to enhance the competitiveness of enterprises and increase customer loyalty. Most customers evaluate value and image of an enterprise in accordance with their experience of its goods and service. This study focused on the Fisheries Cooperative Association representative brand for the marketing point of view and attitude shall establish a scheme that can identify. We suggest a model that can measure the brand equity index (BEI) for equity. Based on the survey, we intended to provide the strategic direction and derive important factors for improving brand equity.
Effects of aerobic and combined exercise on body composition and blood lipid in the middle-aged women
Kim, Yong Cheol ; Kim, Young Soo ; Yang, Jeong Ok ; Lee, Bom Jin ; Lee, Joong Sook ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1241~1251
DOI : 10.7465/jkdi.2013.24.6.1241
The purpose of this study was to investigate and compare the effects of aerobic and complex exercise on body composition and blood lipids in the middle-aged women. Sixteen women whose ages ranged in 40 to 50 years were included in the sample of the study. The sample was divided into two groups: (a) aerobic exercise group (n
"Statistics is difficult"? - Textbooks problems
Lee, Wonwoo ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1253~1262
DOI : 10.7465/jkdi.2013.24.6.1253
This study observes not only how much those who studied Statistics during the college years feel that Statistics is difficult but also why they felt it was difficult. Most of the targeted researchers, 80.8 percent, say "Statistics was difficult". They selected the item "textbooks were hard to understand" as the main reason (62.5%). Based on the explanatory survey of text books, many textbooks do not distinguish the small letter, x from the capital letter, X. Hence, in this study, one of the main reasons why most of the researchers felt Statistics was difficult must be the ambiguousness of the notations. If authors keep in mind the importance of the difference between capital letters and small letters in Statistics, the Statistics learners` recognition of difficulty of Statistics will decline.
Validity assessment of VaR with Laplacian distribution
Byun, Bu-Guen ; Yoo, Do-Sik ; Lim, Jongtae ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1263~1274
DOI : 10.7465/jkdi.2013.24.6.1263
VaR (value at risk), which represents the expectation of the worst loss that may occur over a period of time within a given level of confidence, is currently used by various financial institutions for the purpose of risk management. In the majority of previous studies, the probability of return has been modeled with normal distribution. Recently Chen et al. (2010) measured VaR with asymmetric Laplacian distribution. However, it is difficult to estimate the mode, the skewness, and the degree of variance that determine the shape of an asymmetric Laplacian distribution with limited data in the real-world market. In this paper, we show that the VaR estimated with (symmetric) Laplacian distribution model provides more accuracy than those with normal distribution model or asymmetric Laplacian distribution model with real world stock market data and with various statistical measures.
Parameter estimation in a readjustment procedure in the multivariate integrated process control
Cho, Gyo-Young ; Park, Jong Suk ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1275~1283
DOI : 10.7465/jkdi.2013.24.6.1275
This paper considers the multivariate integrated process control procedure for detecting special causes in a multivariate IMA(1, 1) process. When the multivariate control chart signals, the special cause will be detected and eliminated from the process. However, when the elimination of the special cause costs high or is not practically possible, an alternative action is to readjust the process with approximately modified adjustment scheme. In this paper, we estimate parameters in the readjustment procedure after having a true signal in the multivariate integrated process control.
Determinants of student course evaluation using hierarchical linear model
Cho, Jang Sik ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1285~1296
DOI : 10.7465/jkdi.2013.24.6.1285
The fundamental concerns of this paper are to analyze the effects of student course evaluation using subject characteristic and student characteristic variables. We use a 2-level hierarchical linear model since the data structure of subject characteristic and student characteristic variables is multilevel. Four models we consider are as follows; (1) null model, (2) random coefficient model, (3) mean as outcomes model, (4) intercepts and slopes as outcomes model. The results of the analysis were given as follows. First, the result of null model was that subject characteristics effects on course evaluation had much larger than student characteristics. Second, the result of conditional model specifying subject and student level predictors revealed that class size, grade, tenure, mean GPA of the class, native class for level-1, and sex, department category, admission method, mean GPA of the student for level-2 had statistically significant effects on course evaluation. The explained variance was 13% in subject level, 13% in student level.
An LV-CAST algorithm for emergency message dissemination in vehicular networks
Bae, Ihn-Han ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1297~1307
DOI : 10.7465/jkdi.2013.24.6.1297
Several multi-hop applications developed for vehicular ad hoc networks use broadcast as a means to either discover nearby neighbors or disseminate useful traffic information to othet vehicles located within a certain geographical area. However, the conventional broadcast mechanism may lead to the so-called broadcast storm problem, a scenario in which there is a high level of contention and collision at the link layer due to an excessive number of broadcast packets. To solve broadcast storm problem, we propose an RPB-MACn-based LV-CAST that is a vehicular broadcast algorithm for disseminating safety-related emergency message. The proposed LV-CAST identifies the last node within transmission range by computing the distance extending on 1 hop from the sending node of an emergency message to the next node of receiving node of the emergency message, and the last node only re-broadcasts the emergency message. The performance of LV-CAST is evaluated through simulation and compared with other message dissemination algorithms.
Estimation for the generalized exponential distribution under progressive type I interval censoring
Cho, Youngseukm ; Lee, Changsoo ; Shin, Hyejung ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1309~1317
DOI : 10.7465/jkdi.2013.24.6.1309
There are various parameter estimation methods for the generalized exponential distribution under progressive type I interval censoring. Chen and Lio (2010) studied the parameter estimation method by the maximum likelihood estimation method, mid-point approximation method, expectation maximization algorithm and methods of moments. Among those, mid-point approximation method has the smallest mean square error in the generalized exponential distribution under progressive type I interval censoring. However, this method is difficult to derive closed form of solution for the parameter estimation using by maximum likelihood estimation method. In this paper, we propose two type of approximate maximum likelihood estimate to solve that problem. The simulation results show the obtained estimators have good performance in the sense of the mean square error. And proposed method derive closed form of solution for the parameter estimation from the generalized exponential distribution under progressive type I interval censoring.
Validation of the coach-athlete relationship scale of amateur golf players: Rasch rating scale model
Kim, Sae Hyung ; Choi, Jae Il ; Lee, Jun Woo ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1319~1329
DOI : 10.7465/jkdi.2013.24.6.1319
The purpose of this research was to develop and validate the coach-athlete relationship scale suitable to amateur golf players by applying the Rasch rating scale model. As the coach-athlete relationship scale, the Korean form of scale developed by Kim and Park (2008), which was revised based on the evidence on the basis of inspection contents, was used to conduct a survey on 217 amateur golf athletes. And the unidimensionality, which is the basic assumption of the Rasch model, was verified using the WINSTEPS program, and the appropriateness of the item category was established through the step calibration. The goodness of fit of each question was tested through the goodness-of-fit index and the differential item functioning (DIF) was estimated according to the golf career. When the goodness-of-fit index estimated for each question was 1.30 or more it was judged unfit and the significance level in the analysis was all set as.05. The results of the analysis showed that the measures variance explained by the Rasch measurement model was more (33.7%) than 20%, so the unidimensionality assumptions of the 11 questions (..hospitable posture when my coach is teaching) were satisfied. The result of analyzing the item category (7 scale) with step calibration was found to be unfit, but in the result of reanalyzing by rescoring into a 5-point scale, it was found to be fit. Particularly, in the result of estimating the goodness-of-fit using the systematized item category (5 scale), Question 10 (...my best when my coach is teaching) and Question 11 were found to be unfit, and as a result of estimating the differential functioning item according to golf career, Question 11 was found to be unevenly differentiated according to golf career. So the 5-point scale of Question 9 after eliminating the two questions which were unfit and differentiated was validated to be the coach-athlete relationship scale suitable to amateur golf athletes.
Major gene identification for LPL gene in Korean cattles
Jin, Mi-Hyun ; Oh, Dong-Yep ; Lee, Jea-Young ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1331~1339
DOI : 10.7465/jkdi.2013.24.6.1331
The lipoprotein lipase (LPL) gene can be considered a functional candidate gene that regulates fatty acid composition. Oh etc (2013) investigated the relationship between unsaturated fatty acids and five novel SNPs, and had confirmed that three polymorphic SNPs (c.322G>A, c.329A>T and c.1591G>A) were associated with fatty acid composition. We have used generalized linear model for adjusted environmental effects and multifactor dimensionality reduction (MDR) method to identify gene-gene interaction effect of statistical model in general. We applied the MDR method on the identify major interaction effects of exonic single nucleotide polymorphisms (SNPs) in the LPL gene for economic traits in Korean cattles population.
Comparison of data mining methods with daily lens data
Seok, Kyungha ; Lee, Taewoo ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1341~1348
DOI : 10.7465/jkdi.2013.24.6.1341
To solve the classification problems, various data mining techniques have been applied to database marketing, credit scoring and market forecasting. In this paper, we compare various techniques such as bagging, boosting, LASSO, random forest and support vector machine with the daily lens transaction data. The classical techniques-decision tree, logistic regression-are used too. The experiment shows that the random forest has a little smaller misclassification rate and standard error than those of other methods. The performance of the SVM is good in the sense of misclassfication rate and bad in the sense of standard error. Taking the model interpretation and computing time into consideration, we conclude that the LASSO gives the best result.
Sample size determination based on placements for non-inferiority trials
Kim, Jiyeon ; Kim, Dongjae ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1349~1357
DOI : 10.7465/jkdi.2013.24.6.1349
In clinical research, sample size determination is one of the most important things. There are parametric method using t-test and non-parametric method suggested by Kim and Kim (2007) based on Wilcoxon`s rank sum test for determining sample size in non-inferiority trials. In this paper, we propose sample size calculation method based on placements method suggested by Orban and Wolfe (1982) and using the power calculated by Kim (1994) in non-inferiority trials. We also compare proposed sample size with that using Kim and Kim (2007)`s formula and that of t-test for parametric methods. As the result, sample size calculated by proposed method based on placements is the smallest. Therefore, proposed method based on placements is better than parametric methods in case that it`s hard to assume specific distribution function for population and also more efficient in terms of time and cost than method based on Wilcoxon`s rank sum test.
Study on realization for objective evaluation algorithm of grade by admission office system
Choi, Seungbae ; Lee, Younghak ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1359~1368
DOI : 10.7465/jkdi.2013.24.6.1359
An "admission officer system" has been introduced for the purpose of changing the paradigm from the "current admission system". This is because the current admission system mainly reflects the scholastic ability test (SAT) score regulating student selection for entering university. The admission officer system focuses on not only the students` school records but also, their potential. In addition, this system lets the university screen the students according to its own founding philosophy. One feature of the admission officer system is to cultivate men of ability by caring for the selected students consistently. Typically, students are selected by an admission officer system according to document screening which includes curriculum and non-curriculum scores, discussions, and interviews. On the other hand, the admission officer system might create a lack of objectivity in the way a student is selected because of the admission officer`s own subjectivity. In this study, an algorithm in which the admission officer system can maintain the objectivity on student selection is presented. This is so that the student does not experience any disadvantage from the process of the admission officer system.
Analysis of factor of life planners` satisfaction after turnover using the cumulative logit model
Lee, Deogro ; Chun, Heuiju ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1369~1384
DOI : 10.7465/jkdi.2013.24.6.1369
In this study, we investigate various factors affecting five kinds of life planners` satisfaction after turnover which are general, human relations within organization, sales environment support, economic, life planner management system. Also we suggest theoretical and practical implication to them. The results of survey of life planners are as follows. First, in the general life planners` satisfaction after turnover, insurance company belonged to, recognition on own sales ability, life planners` satisfaction level, financial and insurance related award, education level, marital status, size of branch, and surrounding recognition about life planner are influential factors on it. Second, factors which affect the life planners` human relations satisfaction within organization after turnover are size of branch, surrounding recognition about life planner, and insurance company belonged to. Third, factors which affect the life planners` sales environment support satisfaction after turnover are surrounding recognition on life planner, insurance company belonged to, certificates relating to finance or insurance, size of branch, Fourth, in the solicitors` economic satisfaction after turnover, mainly demographic factors such as education level, marital status, age are crucial to it and also life planners` satisfaction level is influential factor. Last, in the solicitors` management system satisfaction, only experienced turnover type is a influential factor.
An actuarial structure of income replacement ratio in pensions and individual annuity
Han, Jeonglim ; Lee, Hangsuck ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1385~1400
DOI : 10.7465/jkdi.2013.24.6.1385
This paper discusses income replacement ratios of national pension, retirement pension and individual annuities in Korea. These ratios are useful indicators for the assessment of retirement income security of workers. This paper projects income replacement ratios, using the pension entry rate, decrement rates, and life tables of the National Statistical Office. The result of the actuarial projection is that the income replacement ratio of national pension is approximately 21.0 to 22.7%, that the income replacement ratio of retirement pension is about 5.8 to 9.7%, and that the income replacement ratio of an individual annuity is about 13.5 to 21.0%, respectively. The income replacement ratio by income varies due to the effects of income redistribution in national pension and retirement pension, but the income replacement ratio of an individual annuity is constant, regardless of income.
Nonparametric method using placement in a randomized complete block design
Sim, Sujin ; Kim, Dongjae ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1401~1408
DOI : 10.7465/jkdi.2013.24.6.1401
Kim and Kim (1992) proposed typical nonparametric method for umbrella alternative in randomized block design with replications. In this paper, We consider a test procedure for umbrella alternatives in a randomized block design using extension of the two sample placement tests described in Orban and Wolfe (1982) and treatment tests described in Kim (1999). We perform a Monte Carlo study to compare the empirical powers of the test statistics for underlying distributions.
Analysis of latent growth model using repeated measures ANOVA in the data from KYPS
Lee, Hwa-Jung ; Kang, Suk-Bok ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1409~1419
DOI : 10.7465/jkdi.2013.24.6.1409
We analyzed the data from KYPS using the latent growth model which has been widely studied as an analysis method of longitudinal data. In this study, we applied repeated measures ANOVA to unconditional model in order for faster decision of the unconditional model of the latent growth model. Also, we compared the six-type models, the quadratic model and the model of which repeated measures ANOVA is applied.
Study for independence of hits in professional baseball games
Kim, Byungsoo ; Park, Youngwook ; Jang, Nayoung ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1421~1428
DOI : 10.7465/jkdi.2013.24.6.1421
In this paper, we would like to test whether the hit at a particular bat has a dependency with the hitting results at the previous bats in professional baseball games. For this purpose, we used the 2011 Korean Baseball League data. We find out that the hitting percentage at a particular bat has no dependency with the hit at the previous bat, after reviewing the conditional probability of hit at each bat and the lift. From the independence test of hits at consecutive bats, and hit at a particular bat with no hits at previous bats, we can conclude that hits at particular bats are not dependent on the hits at previous bats in most cases. Hence, we can safely conclude that a hit at a particular bat is statistically independent from the hits at the previous bats.
Analysis of the abstracts of research articles in food related to climate change using a text-mining algorithm
Bae, Kyu Yong ; Park, Ju-Hyun ; Kim, Jeong Seon ; Lee, Yung-Seop ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1429~1437
DOI : 10.7465/jkdi.2013.24.6.1429
Research articles in food related to climate change were analyzed by implementing a text-mining algorithm, which is one of nonstructural data analysis tools in big data analysis with a focus on frequencies of terms appearing in the abstracts. As a first step, a term-document matrix was established, followed by implementing a hierarchical clustering algorithm based on dissimilarities among the selected terms and expertise in the field to classify the documents under consideration into a few labeled groups. Through this research, we were able to find out important topics appearing in the field of food related to climate change and their trends over past years. It is expected that the results of the article can be utilized for future research to make systematic responses and adaptation to climate change.
Derivation of error sum of squares of two stage nested designs and its application
Kim, Daehak ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1439~1448
DOI : 10.7465/jkdi.2013.24.6.1439
The analysis of variance for randomized block design or two way classification data is well known. In this paper, particularly, we considered two stage nested design in which the levels of one factor is not identical for different levels of another factor. We investigate the structural properties of two stage nested design and the properties of error sum of squares for random effect model. For the application of two way nested design, we consider two-period crossover design which is used commonly for the equivalence test to bio-similar product. The confidence interval estimation of the difference of two population means in the crossover design is discussed based on statistical package SPSS.
The difference between two distribution functions
Hong, Chong Sun ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1449~1454
DOI : 10.7465/jkdi.2013.24.6.1449
There are many methods for measuring the difference between two location parameters. In this paper, statistics are proposed in order to estimate the difference of two location parameters. The statistics are designed not using the means, variances, signs and ranks, but with the cumulative distribution functions. Hence these are measured as the differences in the area between two univariate cumulative distribution functions. It is found that the difference in the area between two empirical cumulative distribution functions is the difference of two sample means, and its integral is also the difference of two population means.
Bayesian analysis of an exponentiated half-logistic distribution under progressively type-II censoring
Kang, Suk Bok ; Seo, Jung In ; Kim, Yongku ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1455~1464
DOI : 10.7465/jkdi.2013.24.6.1455
This paper develops maximum likelihood estimators (MLEs) of unknown parameters in an exponentiated half-logistic distribution based on a progressively type-II censored sample. We obtain approximate confidence intervals for the MLEs by using asymptotic variance and covariance matrices. Using importance sampling, we obtain Bayes estimators and corresponding credible intervals with the highest posterior density and Bayes predictive intervals for unknown parameters based on progressively type-II censored data from an exponentiated half logistic distribution. For illustration purposes, we examine the validity of the proposed estimation method by using real and simulated data.
Goodness-of-fit tests for a proportional odds model
Lee, Hyun Yung ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1465~1475
DOI : 10.7465/jkdi.2013.24.6.1465
The chi-square type test statistic is the most commonly used test in terms of measuring testing goodness-of-fit for multinomial logistic regression model, which has its grouped data (binomial data) and ungrouped (binary) data classified by a covariate pattern. Chi-square type statistic is not a satisfactory gauge, however, because the ungrouped Pearson chi-square statistic does not adhere well to the chi-square statistic and the ungrouped Pearson chi-square statistic is also not a satisfactory form of measurement in itself. Currently, goodness-of-fit in the ordinal setting is often assessed using the Pearson chi-square statistic and deviance tests. These tests involve creating a contingency table in which rows consist of all possible cross-classifications of the model covariates, and columns consist of the levels of the ordinal response. I examined goodness-of-fit tests for a proportional odds logistic regression model-the most commonly used regression model for an ordinal response variable. Using a simulation study, I investigated the distribution and power properties of this test and compared these with those of three other goodness-of-fit tests. The new test had lower power than the existing tests; however, it was able to detect a greater number of the different types of lack of fit considered in this study. I illustrated the ability of the tests to detect lack of fit using a study of aftercare decisions for psychiatrically hospitalized adolescents.
Dependence structure analysis of KOSPI and NYSE based on time-varying copula models
Lee, Sangyeol ; Kim, Byungsoo ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1477~1488
DOI : 10.7465/jkdi.2013.24.6.1477
In this study, we analyze the dependence structure of KOSPI and NYSE indices based on a two-step estimation procedure. In the rst step, we adopt ARMA-GARCH models with Gaussian mixture innovations for marginal processes. In the second step, time-varying copula parameters are estimated. By using these, we measure the dependence between the two returns with Kendall`s tau and Spearman`s rho. The two dependence measures for various copulas are illustrated.
Optimal thresholds criteria for ROC surfaces
Hong, C.S. ; Jung, E.S. ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1489~1496
DOI : 10.7465/jkdi.2013.24.6.1489
Consider the ROC surface which is a generalization of the ROC curve for three-class diagnostic problems. In this work, we propose ve criteria for the three-class ROC surface by extending the Youden index, the sum of sensitivity and specificity, the maximum vertical distance, the amended closest-to-(0,1) and the true rate. It may be concluded that these five criteria can be expressed as a function of two Kolmogorov-Smirnov statistics. A paired optimal thresholds could be obtained simultaneously from the ROC surface. It is found that the paired optimal thresholds selected from the ROC surface are equivalent to the two optimal thresholds found from the two ROC curves.
On scaled cumulative residual Kullback-Leibler information
Hwang, Insung ; Park, Sangun ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1497~1501
DOI : 10.7465/jkdi.2013.24.6.1497
Cumulative residual Kullback-Leibler (CRKL) information is well defined on the empirical distribution function (EDF) and allows us to construct a EDF-based goodness of t test statistic. However, we need to consider a scaled CRKL because CRKL is not scale invariant. In this paper, we consider several criterions for estimating the scale parameter in the scale CRKL and compare the performances of the estimated CRKL in terms of both power and unbiasedness.
Diversification, performance and optimal business mix of insurance portfolios
Kim, Hyun Tae ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1503~1520
DOI : 10.7465/jkdi.2013.24.6.1503
For multi-line insurance companies, allocating the risk capital to each line is a widely-accepted risk management exercise. In this article we consider several applications of the Euler capital allocation. First, we propose visual tools to present the diversification and the line-wise performance for a given loss portfolio so that the risk managers can understand the interactions among the lines. Secondly, on theoretical side, we prove that the Euler allocation is the directional derivative of the marginal or incremental allocation method, an alternative capital allocation rule in the literature. Lastly, we establish the equivalence between the mean-shortfall optimization and the RORAC optimization when the risk adjusted capital is the expected shortfall, and show how to construct the optimal insurance business mix that maximizes the portfolio RORAC. An actual loss sample of an insurance portfolio is used for numerical illustrations.
Noninformative priors for the scale parameter in the generalized Pareto distribution
Kang, Sang Gil ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1521~1529
DOI : 10.7465/jkdi.2013.24.6.1521
In this paper, we develop noninformative priors for the generalized Pareto distribution when the scale parameter is of interest. We developed the rst order and the second order matching priors. We revealed that the second order matching prior does not exist. It turns out that the reference prior and Jeffrey`s prior do not satisfy a first order matching criterion, and Jeffreys` prior, the reference prior and the matching prior are different. Some simulation study is performed and a real example is given.
Forecasting value-at-risk by encompassing CAViaR models via information criteria
Lee, Sangyeol ; Noh, Jungsik ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1531~1541
DOI : 10.7465/jkdi.2013.24.6.1531
This paper proposes a new method of VaR forecasting using the conditional autoregressive VaR (CAViaR) models and information criteria. Instead of using a single CAViaR model, we propose to utilize several candidate CAViaR models during a forecasting period. By adopting the Akaike and Bayesian information criteria for quantile regression, we can update not only parameter estimates but also the CAViaR specifications. We also propose extended CAViaR models with a constant location parameter. An empirical study is provided to examine the performance of the proposed method. The results suggest that our method shows more stable performance than those using a single specification.
Bayesian estimation for finite population proportion under selection bias via surrogate samples
Choi, Seong Mi ; Kim, Dal Ho ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1543~1550
DOI : 10.7465/jkdi.2013.24.6.1543
In this paper, we study Bayesian estimation for the finite population proportion in binary data under selection bias. We use a Bayesian nonignorable selection model to accommodate the selection mechanism. We compare four possible estimators of the finite population proportions based on data analysis as well as Monte Carlo simulation. It turns out that nonignorable selection model might be useful for weekly biased samples.
Testing the exchange rate data for the parameter change based on ARMA-GARCH model
Song, Junmo ; Ko, Bangwon ;
Journal of the Korean Data and Information Science Society, volume 24, issue 6, 2013, Pages 1551~1559
DOI : 10.7465/jkdi.2013.24.6.1551
In this paper, we analyze the Korean Won/Japanese 100 Yen exchange rate data based on the ARMA-GARCH model, and perform the test for detecting the parameter changes. As a test statistics, we employ the cumulative sum (CUSUM) test for ARMA-GARCH model, which is introduced by Lee and Song (2008). Our empirical analysis indicates that the KRW/JPY exchange rate series experienced several parameter changes during the period from January 2000 to December 2012, which leads to a fitting of AR-IGARCH model to the whole series.