Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 23, Issue 6 - Nov 2012
Volume 23, Issue 5 - Sep 2012
Volume 23, Issue 4 - Jul 2012
Volume 23, Issue 3 - May 2012
Volume 23, Issue 2 - Mar 2012
Volume 23, Issue 1 - Jan 2012
Selecting the target year
Detecting survival related gene sets in microarray analysis
Lee, Sun-Ho ; Lee, Kwang-Hyun ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 1~11
DOI : 10.7465/jkdi.2012.23.1.001
When the microarray experiment developed, main interest was limited to detect differentially expressed genes associated with a phenotype of interest. However, as human diseases are thought to occur through the interactions of multiple genes within a same functional category, the unit of analysis of the microarray experiment expanded to the set of genes. For the phenotype of censored survival time, Gene Set Enrichment Analysis(GSEA), Global test and Wald type test are widely used. In this paper, we modified the Wald type test by adopting normal score transformation of gene expression values and developed a parametric test which requires much less computation than others. The proposed method is compared with other methods using a real data set of ovarian cancer and a simulation data set.
Statistical analysis of actual living condition of the elderly and welfare need survey data
Hong, C.S. ; Jeong, C.H. ; Cho, M.H. ; Kim, H.J. ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 13~24
DOI : 10.7465/jkdi.2012.23.1.013
We need to recognize the overall changes in Korean society with understanding the situation of the characteristic and actual living condition of the elderly, so to prepare efficient and active policies at the moment from aging society to aged society. Based on the 2008 national survey data of the actual living condition of the elderly and welfare need, the living condition of the elderly for household and individual information is identified and the effects on the life satisfaction of these variables are statistically analyzed. Therefore, this paper might help to establish the aged social policies with realistic validity and suitability and extend the academical understanding of Korean society.
A simulation study of rater agreement measures
Han, Kyung-Do ; Park, Yong-Gyu ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 25~37
DOI : 10.7465/jkdi.2012.23.1.025
Many statistics, such as Cohen's (1960)
, Scott's (1955)
, and Park and Park's (2007) H have been proposed as measures of agreement to represent inter-rater reliability. This study compared bias, SE, MSE, and CV of the measures of agreement with nominal and ordinal categories in the balanced marginal distributions, and those with nominal categories in the two paradoxical situations. As a result, in all cases, AC1and Hhad smaller SE and CV.
Comparison of clustering methods of microarray gene expression data
Lim, Jin-Soo ; Lim, Dong-Hoon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 39~51
DOI : 10.7465/jkdi.2012.23.1.039
Cluster analysis has proven to be a useful tool for investigating the association structure among genes and samples in a microarray data set. We applied several cluster validation measures to evaluate the performance of clustering algorithms for analyzing microarray gene expression data, including hierarchical clustering, K-means, PAM, SOM and model-based clustering. The available validation measures fall into the three general categories of internal, stability and biological. The performance of clustering algorithms is evaluated using simulated and SRBCT microarray data. Our results from simulated data show that nearly every methods have good results with same result as the number of classes in the original data. For the SRBCT data the best choice for the number of clusters is less clear than the simulated data. It appeared that PAM, SOM, model-based method showed similar results to simulated data under Silhouette with of internal measure as well as PAM and model-based method under biological measure, while model-based clustering has the best value of stability measure.
An implementation of the sample size and the power for testing mean and proportion
Lee, Chang-Sun ; Kang, Hee-Mo ; Sim, Song-Yong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 53~61
DOI : 10.7465/jkdi.2012.23.1.053
There are cases when the sample size is determined based not only on the significance level but also on on the power or type II error. In this paper, we implemented the sample size and the power calculation when both the significance level and power for testing means in normal distributions and proportions in binomial distributions. The implementation is available on a web site. Alternately, we also calculate the power for a given effect size, type I error probability and sample size.
A case study on the selection of representative statistics for systematic management of administrative statistics
Lee, Kang-Jin ; Kim, Min-Kyoung ; Ahn, Jeong-Yong ; Choi, Kyoung-Ho ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 63~70
DOI : 10.7465/jkdi.2012.23.1.063
In spite of growing demand for the region specific statistics, due to the increase in the cost of making out statistics and other reasons, utilizing survey statistics has limitation on coping with it. Thus, administrative statistics could be a feasible option. In this study, we selected "representative statistics", which are frequently used in establishing regional policy and reflect regional characteristics, among the Jeollabuk-do's administrative statistics. And we suggested the way to enhance quality and credential of the administrative statistics by using systematic management. As a result, we selected 45 statistics for Jeollabuk-do's "representative statistics". The reason that we raise the issue on the necessity of selecting "representative statistics" and specify its selection process is to give guidance to systematic management and efficient utilization of the local government's administrative statistics.
Estimations of the student numbers by nonlinear regression model
Yoon, Yong-Hwa ; Kim, Jong-Tae ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 71~77
DOI : 10.7465/jkdi.2012.23.1.071
This paper introduces the projection methods by nonlinear regression model. To predict the student numbers, a log model and an involution model as the kind of a trend-extrapolation method are used. Empirical evidence shows that a projection by log model is better than by involution model with the confidence interval estimations for the coefficients of determination.
Bandwidth selection for discontinuity point estimation in density
Huh, Jib ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 79~87
DOI : 10.7465/jkdi.2012.23.1.079
In the case that the probability density function has a discontinuity point, Huh (2002) estimated the location and jump size of the discontinuity point based on the difference between the right and left kernel density estimators using the one-sided kernel function. In this paper, we consider the cross-validation, made by the right and left maximum likelihood cross-validations, for the bandwidth selection in order to estimate the location and jump size of the discontinuity point. This method is motivated by the one-sided cross-validation of Hart and Yi (1998). The finite sample performance is illustrated by simulated example.
Likelihood based inference for the ratio of parameters in two Maxwell distributions
Kang, Sang-Gil ; Lee, Jeong-Hee ; Lee, Woo-Dong ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 89~98
DOI : 10.7465/jkdi.2012.23.1.089
In this paper, the ratio of parameters in two independent Maxwell distributions is parameter of interest. We proposed test statistics, which converge to standard normal distribution, based on likelihood function. The exact distribution for testing the ratio is hard to obtain. We proposed the signed log-likelihood ratio statistic and the modified signed log-likelihood ratio statistic for testing the ratio. Through simulation, we show that the modified signed log-likelihood ratio statistic converges faster than signed log-likelihood ratio statistic to standard normal distribution. We compare two statistics in terms of type I error and power. We give an example using real data.
A study on log-density with log-odds graph for variable selection in logistic regression
Kahng, Myung-Wook ; Shin, Eun-Young ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 99~111
DOI : 10.7465/jkdi.2012.23.1.099
The log-density ratio of the conditional densities of the predictors given the response variable provides useful information for variable selection in the logistic regression model. In this paper, we consider the predictors that are needed and how they should be included in the model. If the conditional distributions are skewed, the distributions can be considered as gamma distributions. Under this assumption, linear and log terms are generally included in the model. The log-odds graph is a very useful graphical tool in this study. A graphical study is presented which shows that if the conditional distributions of x|y for the two groups overlap significantly, we need both the linear and quadratic terms. On the contrary, if they are well separated, only the linear or log term is needed in the model.
The study of comparisons of standardization methods
Min, Dae-Kee ; Jung, Ji-Hyeon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 113~120
DOI : 10.7465/jkdi.2012.23.1.113
When we evaluate prospective students in the interview process, we have to implement a system in which each student can be fairly judged. This process, the standardization of the scores which the interviewers have produced based on a student's performance, is implemented to ensure that each student receives a score that objectively translates one's performance. Although we don't know exactly how effective the standardization is in many different cases, we have researched which standardization methods are most stable and have minimum risks among the four methods such as STD, Range, MAD and IQR. These methods use scales such as standard deviation, range, maximum median and interquartile range.
A study on association rule creation by marginally conditional variables
Cho, Kwang-Hyun ; Park, Hee-Chang ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 121~129
DOI : 10.7465/jkdi.2012.23.1.121
Association rule mining searches for interesting relationships among items in a given database. Currently, study of the constraint-based association rules are underway by many researchers. When we create relation rule, we can often find a lot of rules. Of this rules, we can find rule that direct relativity by marginally conditional variables (intervening variable, external variable) does not exist. In such a case, this association rule can be considered insignificant. In this study, we want to study for association rules creation using marginally conditional variable. The result of this study can find meaningless association rules. Also, we can understand more exactly the relationships between variables.
A study on the factors affecting the life satisfaction of the elderly
Choi, Hyun-Seok ; Ha, Jeong-Cheol ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 131~142
DOI : 10.7465/jkdi.2012.23.1.131
Since Korea is moving towards the aged society, increasing is the social attention on overall life satisfaction of the elderly. The purpose of this study is to find the factors affecting the life satisfaction of the elderly among demographic characteristics of aged people, categorized satisfactions and sources of income, based on the 2008 national survey data of the actual living condition of the elderly and welfare need. We found that many factors have significant impact on the life satisfaction of the elderly, such as demographic characteristics, the level of physical and mental health, the economic level.
Course evaluation model using standardized transformation by group in student evaluation of teaching
Lee, Jae-Man ; Cha, Young-Joon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 143~150
DOI : 10.7465/jkdi.2012.23.1.143
Based on the student evaluation of teaching from 'A' university where students are graded by relative performance, we conducted a research on the effect of course characteristics in student evaluation of teaching. The results show that the score of student evaluation of teaching seems to be higher for those classes with more proportion of male students, higher grades, and smaller class sizes. From these results, we suggest an evaluation model which can control the effect of grade and sex. Also we illustrate the performance of the evaluation model by using a case study.
Multi-currencies portfolio strategy using principal component analysis and logistic regression
Shim, Kyung-Sik ; Ahn, Jae-Joon ; Oh, Kyong-Joo ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 151~159
DOI : 10.7465/jkdi.2012.23.1.151
This paper proposes to develop multi-currencies portfolio strategy using principal component analysis (PCA) and logistic regression (LR) in foreign exchange market. While there is a great deal of literature about the analysis of exchange market, there is relatively little work on developing trading strategies in foreign exchange markets. There are two objectives in this paper. The first objective is to suggest portfolio allocation method by applying PCA. The other objective is to determine market timing which is the strategy of making buy or sell decision using LR. The results of this study show that proposed model is useful trading strategy in foreign exchange market and can be desirable solution which gives lots of investors an important investment information.
Bagging consumer modeling for successive growth and establishment of bancassurance
Kim, Tae-Ho ; Jung, Jae-Hwa ; Kim, Jin-Soo ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 161~170
DOI : 10.7465/jkdi.2012.23.1.161
As insurance consumers' needs have been diversified and subdivided, it is increasingly important to grasp their preferences by characteristics and properties. Even though changer in sales channels and marketing conditions of insurance require to analyze what consumers take serious views to purchase, it is difficult to devise marketing strategies since not many concrete studies have been done in this field. A questionnaire survey was carried out to learn detailed information about basic disposition and buying patterns of insurance consumers. Applying efficient statistical techniques and then utilizing a model for securing new customers, this study attempts to explore a plan for rapid growth and successive establishment of bancassurance.
Web-based program development for clinical data management system establishment
Shin, Im-Hee ; Kim, Dal-Ho ; Kim, Sang-Gyung ; Sohn, Ki-Cheul ; Park, Chun-Woo ; Kwak, Sang-Gyu ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 171~177
DOI : 10.7465/jkdi.2012.23.1.171
Various phenomenon can be expressed numerically and collected as a data due to rapid development of the computer. In particular large set of data is collected in various fields. We can obtain the information for final decision based on analysis and interpretation of the data. The issue is the management of the data as well as the importance of the data. So a system which stores the data in server and prints out the data to web browser is demanded. We uploaded the file of Excel form to server database and developed a web based program which can show the uploaded data through web based database. We used the Oracle DB for uploading and web programming language such as html, JAVA, JSP for querying the data. Finally, we developed a program for web based clinical data management system construction.
Empowerment and motivation predicted by relationship between badminton coaches-athletes
Lee, Mi-Sook ; Kim, Hong-Gi ; Nam, Jung-Hoon ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 179~190
DOI : 10.7465/jkdi.2012.23.1.179
The purpose of this study was to verify the contribution of the relationship between coaches-athletes which the athletes have perceived concerning the empowerment and sport motivation on the basis of the relational characteristics between badminton coaches-athletes. The results were as follows. First, the relationship between badminton coaches-athletes had positive effect on the formation of empowerment to badminton athletes. Second, the relationship between badminton coaches-athletes had positive effect on the internal motivation and external motivation among the sport motivation of badminton athletes, while it had no effect on non-motivation. Third, the empowerment of badminton athletes had positive effect on the internal motivation and external motivation, but it had no effect on non-motivation.
Development of web based system for statistical analysis of clinical data
Kim, Dal-Ho ; Shin, Im-Hee ; Choe, Jung-Youn ; Kim, Sang-Gyung ; Park, Chun-Woo ; Kwak, Sang-Gyu ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 191~198
DOI : 10.7465/jkdi.2012.23.1.191
Statistical analysis is a process which produces information based on data gathering and summary for final decision. In various application fields, we obtain information which supports final decision using statistical analysis. But statistical software program in PC (personal computer) is restricted by time and space. So web based system which can be used in web browser has been developed to minimize these restrictions. To overcome these restrictions, we have developed web based system for statistical analysis without a particular software.
H-likelihood approach for variable selection in gamma frailty models
Ha, Il-Do ; Cho, Geon-Ho ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 199~207
DOI : 10.7465/jkdi.2012.23.1.199
Recently, variable selection methods using penalized likelihood with a shrink penalty function have been widely studied in various statistical models including generalized linear models and survival models. In particular, they select important variables and estimate coefficients of covariates simultaneously. In this paper, we develop a penalize h-likelihood method for variable selection in gamma frailty models. For this we use the smoothly clipped absolute deviation (SCAD) penalty function, which satisfies a good property in variable selection. The proposed method is illustrated using simulation study and a practical data set.
Property of regression estimators in GEE models for ordinal responses
Lee, Hyun-Yung ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 209~218
DOI : 10.7465/jkdi.2012.23.1.209
The method of generalized estimating equations (GEEs) provides consistent esti- mates of the regression parameters in a marginal regression model for longitudinal data, even when the working correlation model is misspecified (Liang and Zeger, 1986). In this paper we compare the estimators of parameters in GEE approach. We consider two aspects: coverage probabilites and efficiency. We adopted to ordinal responses th results derived from binary outcomes.
An approximate maximum likelihood estimator in a weighted exponential distribution
Lee, Jang-Choon ; Lee, Chang-Soo ;
Journal of the Korean Data and Information Science Society, volume 23, issue 1, 2012, Pages 219~225
DOI : 10.7465/jkdi.2012.23.1.219
We derive approximate maximum likelihood estimators of two parameters in a weighted exponential distribution, and derive the density function for the ratio Y=(X+Y) of two independent weighted exponential random variables X and Y, and then observe the skewness of the ratio density.