Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
Journal of the Korean Data and Information Science Society
Journal Basic Information
Journal DOI :
Korean Data and Information Science Society
Editor in Chief :
Volume & Issues
Volume 25, Issue 6 - Nov 2014
Volume 25, Issue 5 - Sep 2014
Volume 25, Issue 4 - Jul 2014
Volume 25, Issue 3 - May 2014
Volume 25, Issue 2 - Mar 2014
Volume 25, Issue 1 - Jan 2014
Selecting the target year
New approximations of the ruin probability in a continuous time surplus process
Kwon, Cheonga ; Choi, Seung Kyoung ; Lee, Eui Yong ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 1~10
DOI : 10.7465/jkdi.2014.25.1.1
In this paper, we study approximations of the ruin probability in a continuous time surplus process. First, we introduce the well-known approximation formulas of the ruin probability such as Cram
r, Tijms` and De Vylder`s methods. We, then, suggest new approximation formulas of two types, which improve the existing approximation formulas. One is Cram
r and Tijms` type which makes use of the moment generating function of distribution of a claim size and the other is De Vylder`s type which makes use of the surplus process with exponential claims. Finally, we compare, by illustrating numerical examples, the newly suggested approximation formulas with the existing approximation formulas of the ruin probability.
A case study on verification of internet survey
Ryu, Gui-Yeol ; Moon, Young-Soo ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 11~18
DOI : 10.7465/jkdi.2014.25.1.11
The object of study is to verify the accuracy of internet survey by comparing database data and internet survey. Internet survey was conducted on August, 2012. Respondents were subscribers of KISTI NDSL. Variables were age, organization as demographic variables, number of use, and period of use as attitude variables. Mismatch rates of age, organization, number of use, and period, are 7.5%, 5%, 92%, and 55% respectively. We could estimate the mismatch rate for age as 3% as a pessimistic point of view, and 1% as an optimistic point of view by detail verification. The mismatch rates of organization are 4.5% as a pessimistic point of view, and 2% as an optimistic point of view. The mismatch rates for the frequency of use, the period of use are very high, because measurement error, problems in memory, and internet attitude, etc. Implication of this study is that data of internet survey could be reliable. Many further researches are needed for verification of internet survey.
A polychotomous regression model with tensor product splines and direct sums
Sim, Songyong ; Kang, Heemo ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 19~26
DOI : 10.7465/jkdi.2014.25.1.19
In this paper, we propose a polychotomous regression model when independent variables include both categorical and numerical variables. For categorical independent variables, we use direct sums, and tensor product splines are used for continuous independent variables. We use BIC for varible selections criterior. We implemented the algorithm and apply the algorithm to real data. The use of direct sums and tensor products outperformed the usual multinomial logistic regression model.
Development of web-based system for dynamic statistical analysis of clinical data
Shin, Im Hee ; Kwak, Sang Gyu ; Park, Jun Woo ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 27~36
DOI : 10.7465/jkdi.2014.25.1.27
Statistical analysis provides information that can be applied to draw final decisions in many fields. However, statistical analysis program for PC (personal computer) is yet restricted by time and space. To minimize this issue, a server based PC statistic analysis program using internet in addition to web based system allowing statistical analysis have been continually developed. However, the current web based analysis system is limited to the data that is saved on the server. Data that is modified or newly inserted must go through a server administrator before its use in analysis. In order to solve this problem, we have developed a web based system using HTML, java, JSP scripts to incorporate dynamic data without much restriction.
Home hospice palliative care service in Korea: Based on focus group interview
Koh, Su-Jin ; Kim, Yeol ; Song, Mi Ok ; Choi, Youngsim ; Choi, Sung Eun ; Jho, Hyun Jung ; Huh, Yun Jung ; Park, Myung-Hee ; Park, Seon Ju ; Kwon, So-Hi ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 37~52
DOI : 10.7465/jkdi.2014.25.1.37
The aim of this study was to understand home hospice care status and problem in Korea, and ultimately to develop the home hospice standard. This study was conducted as a part of a study on the institutionalization of the home hospice in Korea. A focus group interview with representatives of seven home hospice agency where have provided home hospice service for years was conducted. All of the participants agreed to the essential components for home hospice service including 24 hour on call service, multidisciplinary team visiting, and periodical team meeting. Visiting frequency was 1-3 times per week mostly by nurses. And they agreed requisitely to fulfill an office for home visiting nurses, storage space, and home visiting bags. The obstacles of providing home hospice were 1) no reimbursement system, 2) difficulties to change medication at home, 3) lack of inpatient beds for symptom control. Standardization of home hospice is critical to improve service quality and to develop reimbursement system. The findings of this study could be used as a basic data to develop home hospice standards and guidelines.
A comparison study for accuracy of exit poll based on nonresponse model
Kwak, Jeongae ; Choi, Boseung ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 53~64
DOI : 10.7465/jkdi.2014.25.1.53
One of the major problems to forecast election, especially based on survey, is nonresponse. We may have different forecasting results depend on method of imputation. Handling nonresponse is more important in a survey about sensitive subject, such as presidential election. In this research, we consider a model based method of nonresponse imputation. A model based imputation method should be constructed based on assumption of nonresponse mechanism and may produce different results according to the nonresponse mechanism. An assumption of the nonresponse mechanism is very important precondition to forecast the accurate results. However, there is no exact way to verify assumption of the nonresponse mechanism. In this paper, we compared the accuracy of prediction and assumption of nonresponse mechanism based on the result of presidential election exit poll. We consider maximum likelihood estimation method based on EM algorithm to handle assumption of the model of nonresponse. We also consider modified within precinct error which Bautista (2007) proposed to compare the predict result.
A study on the forecasting models using housing price index
Lim, Seong Sik ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 65~76
DOI : 10.7465/jkdi.2014.25.1.65
Housing prices are influenced by external shock factors such as real estate policy or economy. Thus, the intervention effect is important for the development of forecasting model for housing price index. In this paper, we examined the degree of effective power of external shock factors for forecasting housing price index and analyzed time series models for efficient forecasting of housing price index. It is shown that intervention models are better than other models in forecasting results using real data based on the accuracy criteria.
The influence of expectations regarding aging on health-promoting behaviors
Bae, Hyeyoung ; Kim, Aranbyeol ; Nam, Soojin ; Youn, Jia ; Youn, Haeju ; Kim, Gayoung ; Jang, Daehyae ; Kim, Su Hyun ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 77~85
DOI : 10.7465/jkdi.2014.25.1.77
The purpose of the current study was to identify expectations regarding aging and health-promoting behaviors and to examine whether expectation regarding aging was associated with health-promoting behaviors among community-residing Korean adults. Data were collected from 233 adults dwelling in the community of Daegu and Kyungpook province. The influence of expectations regarding on health-promoting behaviors was analyzed through hierarchical multiple regression controlling for sociodemographic variables. As a result, the mean score of expectations regarding aging was significantly lower in 40s and 50s than 20s and 30s. The participants had the lowest expectations regarding aging in terms of physical health domain and the highest expectations in terms of mental health domain. No significant differences were found in health-promoting behaviors among different age groups. After controlling for sociodemographic variables, expectations regarding aging were independently associated with health-promoting behaviors in adults in 20s~30s but not in those in 40s~50s. The findings suggest the need for encouraging Korean adults to strive for having positive and active perspective on aging and for getting higher expectations regarding aging, in particular, for 20s and 30s, as a health-promoting strategy.
Comparison study on kernel type estimators of discontinuous log-variance
Huh, Jib ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 87~95
DOI : 10.7465/jkdi.2014.25.1.87
In the regression model, Kang and Huh (2006) studied the estimation of the discontinuous variance function using the Nadaraya-Watson estimator with the squared residuals. The local linear estimator of the log-variance function, which may have the whole real number, was proposed by Huh (2013) based on the kernel weighted local-likelihood of the
-distribution. Chen et al. (2009) estimated the continuous variance function using the local linear fit with the log-squared residuals. In this paper, the estimator of the discontinuous log-variance function itself or its derivative using Chen et al. (2009)`s estimator. Numerical works investigate the performances of the estimators with simulated examples.
The proposition of cosine net confidence in association rule mining
Park, Hee Chang ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 97~106
DOI : 10.7465/jkdi.2014.25.1.97
The development of big data technology was to more accurately predict diversified contemporary society and to more efficiently operate it, and to enable impossible technique in the past. This technology can be utilized in various fields such as the social science, economics, politics, cultural sector, and science technology at the national level. It is a prerequisite to find valuable information by data mining techniques in order to analyze big data. Data mining techniques associated with big data involve text mining, opinion mining, cluster analysis, association rule mining, and so on. The most widely used data mining technique is to explore association rules. This technique has been used to find the relationship between each set of items based on the association thresholds such as support, confidence, lift, similarity measures, etc.This paper proposed cosine net confidence as association thresholds, and checked the conditions of interestingness measure proposed by Piatetsky-Shapiro, and examined various characteristics. The comparative studies with basic confidence and cosine similarity, and cosine net confidence were shown by numerical example. The results showed that cosine net confidence are better than basic confidence and cosine similarity because of the relevant direction.
Using cluster analysis and genetic algorithm to develop portfolio investment strategy based on investor information
Cheong, Donghyun ; Oh, Kyong Joo ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 107~117
DOI : 10.7465/jkdi.2014.25.1.107
The main purpose of this study is to propose a portfolio investment strategy based on investor types information. For improvement of investment performance, artificial intelligence techniques are used to construct a portfolio. Among many artificial intelligence techniques, cluster analysis is applied to select securities and genetic algorithm is applied to assign the respective weight within the portfolio. Empirical experiments in the Korean stock market show that proposed portfolio investment strategy is practicable and superior strategy. This result implies that analysis of investor`s trading behavior may assist investors to make an investment decision and to get superior performance.
Lowess and outlier analysis of biological oxygen demand on Nakdong main stream river
Kim, Jong Tae ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 119~130
DOI : 10.7465/jkdi.2014.25.1.119
This paper is based on water information system of NIE, National Institute of Environmental Research. We used monthly data of water quality from January, 2013 to August, 2013 starting from measuring point A (nbA) to measuring point N (nbN) located along the Nakdong river main stream. Statistical water quality analysis of BOD (biological oxygen demand) is specified by R programming depending on month, year, and points. Based on BOD measured from Nakdong river`s measuring points, we used exploratory data analysis and locally weighted scatter plot smoother (Lowess) trend analysis, which is a method of non-parametic regression analysis, to analyze long-term water tendency and water quality distribution depending on points. Also, we analyzed the period and the measuring point of which the outliers are abundant. As a result, compared to BOD measured in nbM located in Busan along the downstream, BOD measured in nbG located in Daegu and nbI located in Changwon along the midstream showed higher rate of water pollution at a severe level.
A financial projection model on defined benefit pension plan
Han, Jeonglim ; Lee, Hangsuck ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 131~153
DOI : 10.7465/jkdi.2014.25.1.131
The Korean market of pension plans has recently increased and pension plans will be expected to play an important role in the retirement system as complement of the national pension system in the future. However, there are a few of research papers on actuarial projections of pension plans. This paper will discuss a long-term financial projection on defined pension plans using data based on the national pension workplace participants. Previous researches focused on company-based financial projection of pension plan. But, this paper concerns on total Korean pension participants and suggests a method to calculate future financial projection of total pension plans. Finally, this research will suggest several numerical results of normal costs, benefits, numbers of workers, etc.
The research on daily temperature using continuous AR model
Kim, Ji Young ; Jeong, Kiho ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 155~167
DOI : 10.7465/jkdi.2014.25.1.155
This study uses a continuous autoregressive (CAR) model to analyze daily average temperature in six Korean metropolitan cities. Data period is Jan. 1, 1954 to Dec. 31, 2010 covering 57 years. Using a relative long time series reveals that the linear time trend components are all statistically significant in the six cities, which was not shown in previous studies. Particularly the plus sign of its coefficient implies the effect on Korea of the global warming. Unit-root test results are that the temperature time series are stationary without unit-root. It turns out that CAR(3) is suitable for stochastic component of the daily temperature. Since developing suitable continuous stochastic model of the underlying weather related variables is crucial in pricing the weather derivatives, the results in this study will likely prove useful in further future studies on pricing weather derivatives.
Derivation of the likelihood function for the counting process
Oh, Changhyuck ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 169~176
DOI : 10.7465/jkdi.2014.25.1.169
Counting processes are widely used in many fields, whose properties are determined by the intensity function. For estimation of the parameters of the intensity functions when the process is observed continuously over a fixed interval, the likelihood function is of interest. However in the literature there are only heuristic derivations and some results are not coincident. We thus in this note derive the likelihood function of the counting process in a rigorous way. So this note fill up a hole in derivation of the likelihood function.
Exploring interaction using 3-D residual plots in logistic regression model
Kahng, Myung-Wook ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 177~185
DOI : 10.7465/jkdi.2014.25.1.177
Under bivariate normal distribution assumptions, the interaction and quadratic terms are needed in the logistic regression model with two predictors. However, depending on the correlation coefficient and the variances of two conditional distributions, the interaction and quadratic terms may not be necessary. Although the need for these terms can be determined by comparing the two scatter plots, it is not as useful for interaction terms. We explore the structure and usefulness of the 3-D residual plot as a tool for dealing with interaction in logistic regression models. If predictors have an interaction effect, a 3-D residual plot can show the effect. This is illustrated by simulated and real data.
Sensitivity analysis in Bayesian nonignorable selection model for binary responses
Choi, Seong Mi ; Kim, Dal Ho ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 187~194
DOI : 10.7465/jkdi.2014.25.1.187
We consider a Bayesian nonignorable selection model to accommodate the selection bias. Markov chain Monte Carlo methods is known to be very useful to fit the nonignorable selection model. However, sensitivity to prior assumptions on parameters for selection mechanism is a potential problem. To quantify the sensitivity to prior assumption, the deviance information criterion and the conditional predictive ordinate are used to compare the goodness-of-fit under two different prior specifications. It turns out that the `MLE` prior gives better fit than the `uniform` prior in viewpoints of goodness-of-fit measures.
Goodness-of-fit test for the logistic distribution based on multiply type-II censored samples
Kang, Suk-Bok ; Han, Jun-Tae ; Cho, Young-Seuk ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 195~209
DOI : 10.7465/jkdi.2014.25.1.195
In this paper, we derive the estimators of the location parameter and the scale parameter in a logistic distribution based on multiply type-II censored samples by the approximate maximum likelihood estimation method. We use four modified empirical distribution function (EDF) types test for the logistic distribution based on multiply type-II censored samples using proposed approximate maximum likelihood estimators. We also propose the modified normalized sample Lorenz curve plot for the logistic distribution based on multiply type-II censored samples. For each test, Monte Carlo techniques are used to generate the critical values. The powers of these tests are also investigated under several alternative distributions.
Upgraded quadratic inference functions for longitudinal data with type II time-dependent covariates
Cho, Gyo-Young ; Dashnyam, Oyunchimeg ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 211~218
DOI : 10.7465/jkdi.2014.25.1.211
Qu et. al. (2000) proposed the quadratic inference functions (QIF) method to marginal model analysis of longitudinal data to improve the generalized estimating equations (GEE). It yields a substantial improvement in efficiency for the estimators of regression parameters when the working correlation is misspecified. But for the longitudinal data with time-dependent covariates, when the implicit full covariates conditional mean (FCCM) assumption is violated, the QIF can not provide more consistent and efficient estimator than GEE (Cho and Dashnyam, 2013). Lai and Small (2007) divided time-dependent covariates into three types and proposed generalized method of moment (GMM) for longitudinal data with time-dependent covariates. They showed that their GMM type II and GMM moment selection methods can be more ecient than GEE with independence working correlation (GEE-ind) in the case of type II time-dependent covariates. We develop upgraded QIF method for type II time-dependent covariates. We show that this upgraded QIF method can provide substantial gains in efficiency over QIF and GEE-ind in the case of type II time-dependent covariates.
Some properties of reliability, ratio, maximum and minimum in a bivariate exponential distribution with a dependence parameter
Lee, Jang Choon ; Kang, Jun Ho ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 219~226
DOI : 10.7465/jkdi.2014.25.1.219
In this paper, we derived estimators of reliability P(Y < X) and the distribution of ratio in the bivariate exponential density. We also considered the means and variances of M
Noninformative priors for the log-logistic distribution
Kang, Sang Gil ; Kim, Dal Ho ; Lee, Woo Dong ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 227~235
DOI : 10.7465/jkdi.2014.25.1.227
In this paper, we develop the noninformative priors for the scale parameter and the shape parameter in the log-logistic distribution. We developed the first and second order matching priors. It turns out that the second order matching prior matches the alternative coverage probabilities, and is a highest posterior density matching prior. Also we revealed that the derived reference prior is the second order matching prior for both parameters, but Jerffrey`s prior is not a second order matching prior. We showed that the proposed reference prior matches the target coverage probabilities in a frequentist sense through simulation study, and an example based on real data is given.
Some applications for the difference of two CDFs
Hong, Chong Sun ; Son, Yun Hwan ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 237~244
DOI : 10.7465/jkdi.2014.25.1.237
It is known that the dierence in the length between two location parameters of two random variables is equivalent to the difference in the area between two cumulative distribution functions. In this paper, we suggest two applications by using the difference of distribution functions. The first is that the difference of expectations of a certain function of two continuous random variables such as the differences of two kth moments and two moment generating functions could be defined by using the difference between two univariate distribution functions. The other is that the difference in the volume between two empirical bivariate distribution functions is derived. If their covariance is estimated to be zero, the difference in the volume between two empirical bivariate distribution functions could be defined as the difference in two certain areas.
Stochastic simulation of daily precipitation: A copula approach
Choi, Changhui ; Ko, Bangwon ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 245~254
DOI : 10.7465/jkdi.2014.25.1.245
The traditional methods of simulating daily precipitation have paid little attention to the inherent dependence structure between the total precipitation amount and the precipitation frequency for a fixed period of time. To address this issue, we propose a new simulation algorithm using copula in order to incorporate the dependence into the traditional methods. The algorithm consists of two parts: First, while reflecting the observed dependence, we generate the total precipitation amount (S) and the frequency (N) during the period of interest; then we simulate the daily precipitation whose aggregation matches the pair of (N; S) generated in the first part. Our result shows that the proposed method substantially improves the traditional methods.
A maximum likelihood estimation method for a mixture of shifted binomial distributions
Oh, Changhyuck ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 255~261
DOI : 10.7465/jkdi.2014.25.1.255
Many studies have estimated a mixture of binomial distributions. This paper considers an extension, a mixture of shifted binomial distributions, and the estimation of the distribution. The range of each component binomial distribution is rst evaluated and then for each possible value of shifted parameters, the EM algorithm is employed to estimate those parameters. From a set of possible value of shifted parameters and corresponding estimated parameters of the distribution, the likelihood of given data is determined. The simulation results verify the performance of the proposed method.
Estimation using response probability when missing data happen on the second occasion
Park, Hyeonah ; Na, Seongryong ;
Journal of the Korean Data and Information Science Society, volume 25, issue 1, 2014, Pages 263~269
DOI : 10.7465/jkdi.2014.25.1.263
When the loss of samples appears under repeated surveys, new samples can often replace missing values. Estimators using response probability can be considered under repeated surveys on two occasions where new samples are selected instead of missing data on the second occasion. We propose a new estimator that uses both respondents and new samples on the second occasion. It is considered for the simulation setting that missing values can happen at the second occasion and are replaced by new samples. We can see that the proposed estimator is more efficient than that using a weighting adjustment method for respondents at the second occasion.