• Title/Summary/Keyword: imputation

Search Result 243, Processing Time 0.028 seconds

Application of discrete Weibull regression model with multiple imputation

  • Yoo, Hanna
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.3
    • /
    • pp.325-336
    • /
    • 2019
  • In this article we extend the discrete Weibull regression model in the presence of missing data. Discrete Weibull regression models can be adapted to various type of dispersion data however, it is not widely used. Recently Yoo (Journal of the Korean Data and Information Science Society, 30, 11-22, 2019) adapted the discrete Weibull regression model using single imputation. We extend their studies by using multiple imputation also with several various settings and compare the results. The purpose of this study is to address the merit of using multiple imputation in the presence of missing data in discrete count data. We analyzed the seventh Korean National Health and Nutrition Examination Survey (KNHANES VII), from 2016 to assess the factors influencing the variable, 1 month hospital stay, and we compared the results using discrete Weibull regression model with those of Poisson, negative Binomial and zero-inflated Poisson regression models, which are widely used in count data analyses. The results showed that the discrete Weibull regression model using multiple imputation provided the best fit. We also performed simulation studies to show the accuracy of the discrete Weibull regression using multiple imputation given both under- and over-dispersed distribution, as well as varying missing rates and sample size. Sensitivity analysis showed the influence of mis-specification and the robustness of the discrete Weibull model. Using imputation with discrete Weibull regression to analyze discrete data will increase explanatory power and is widely applicable to various types of dispersion data with a unified model.

Missing Imputation Methods Using the Spatial Variable in Sample Survey (표본조사에서 공간 변수(SPATIAL VARIABLE)를 이용한 결측 대체(MISSING IMPUTATION)의 효율성 비교)

  • Lee Jin-Hee;Kim Jin;Lee Kee-Jae
    • The Korean Journal of Applied Statistics
    • /
    • v.19 no.1
    • /
    • pp.57-67
    • /
    • 2006
  • In sampling survey, nonresponse tend to occur inevitably. If we use information from respondents only, the estimates will be baised. To overcome this, various non-response imputation methods have been studied. If there are few auxiliary variables for replacing missing imputation or spatial autocorrelation exists between respondents and nonrespondents, spatial autocorrelation can be used for missing imputation. In this paper, we apply several nonresponse imputation methods including spatial imputation for the analysis of farm household economy data of the Gangwon-Do in 2002 as an example. We show that spatial imputation is more efficient than other methods through the numerical simulations.

A Comparison of BLS Non-Response Adjustment and Cross-Wave Regression Imputation Methods (BLS 무응답 보정법을 이용한 대체법과 이월대체법에 관한 연구)

  • Lee, Sang-Eun;Shin, Key-Il
    • The Korean Journal of Applied Statistics
    • /
    • v.23 no.5
    • /
    • pp.909-921
    • /
    • 2010
  • Cross-wave regression imputation and carry-over imputation method are generally used in the analysis of panel data with missing values. Recently it is known that the BLS non-response adjust method has good statistical properties. In this paper we show that the BLS method can be considered as an imputation method with a similar formula of a ratio-estimator. In addition, we show that the carry-over imputation and BLS imputation are approximately the same under the assumption that data follow a non-stationary process with drift. Small simulation studies and real data analysis are performed. For the real data analysis, a monthly labor statistic (2007) is used.

A New Method for Imputation of Missing Genotype using Linkage Disequilibrium and Haplotype Information (결측치가 존재하는 유전형 자료에서의 연관불균형과 일배체형을 사용한 결측치 대치 방법)

  • Park Yun-Ju;Kim Young-Jin;Park Jung-Sun;Kim Kuchan;Koh Insong;Jung Ho-Youl
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.2
    • /
    • pp.99-107
    • /
    • 2005
  • In this paper, wc propose a now missing imputation method for minimizing loss of information linkage disequilibrium-based and haplotype-based imputation method, which estimate missing values of the data based on the specificity of Single Nucleotide Polymorphism(SNP) genotype data. Method for imputing data is needed to minimize the loss of information caused by experimental missing data. In general, missing imputation of biological data has used major allele imputation method. but this approach is not optima]. 1'his method has high error rates of missing values estimation since the characteristics of the genotype data are not considered not take into consideration the specific structure of the data. In this paper, we show the results of the comparative evaluation of our model methods and major imputation method for the estimation of missing values.

Comparison of imputation methods for item nonresponses in a panel study (패널자료에서의 항목무응답 대체 방법 비교)

  • Lee, Hyejung;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.3
    • /
    • pp.377-390
    • /
    • 2017
  • When conducting a survey, item nonresponse occurs if the respondent does not respond to some items. Since analysis based only on completely observed data may cause biased results, imputation is often conducted to analyze data in its complete form. The panel study is a survey method that examines changes of responses over time. In panel studies, there has been a preference for using information from response values of previous waves when the imputation of item nonresponses is performed; however, limited research has been conducted to support this preference. Therefore, this study compares the performance of imputation methods according to whether or not information from previous waves is utilized in the panel study. Among imputation methods that utilize information from previous responses, we consider ratio imputation, imputation based on the linear mixed model, and imputation based on the Bayesian linear mixed model approach. We compare the results from these methods against the results of methods that do not use information from previous responses, such as mean imputation and hot deck imputation. Simulation results show that imputation based on the Bayesian linear mixed model performs best and yields small biases and high coverage rates of the 95% confidence interval even at higher nonresponse rates.

A comparison of imputation methods using nonlinear models (비선형 모델을 이용한 결측 대체 방법 비교)

  • Kim, Hyein;Song, Juwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.4
    • /
    • pp.543-559
    • /
    • 2019
  • Data often include missing values due to various reasons. If the missing data mechanism is not MCAR, analysis based on fully observed cases may an estimation cause bias and decrease the precision of the estimate since partially observed cases are excluded. Especially when data include many variables, missing values cause more serious problems. Many imputation techniques are suggested to overcome this difficulty. However, imputation methods using parametric models may not fit well with real data which do not satisfy model assumptions. In this study, we review imputation methods using nonlinear models such as kernel, resampling, and spline methods which are robust on model assumptions. In addition, we suggest utilizing imputation classes to improve imputation accuracy or adding random errors to correctly estimate the variance of the estimates in nonlinear imputation models. Performances of imputation methods using nonlinear models are compared under various simulated data settings. Simulation results indicate that the performances of imputation methods are different as data settings change. However, imputation based on the kernel regression or the penalized spline performs better in most situations. Utilizing imputation classes or adding random errors improves the performance of imputation methods using nonlinear models.

Application of Multiple Imputation Method in Analyzing Data with Missing Continuous Covariates

  • Ghasemizadeh Tamar, S.;Ganjali, M.
    • The Korean Journal of Applied Statistics
    • /
    • v.21 no.4
    • /
    • pp.659-664
    • /
    • 2008
  • Missing continuous covariates are pervasive in the use of generalized linear models for medical data. Multiple imputation is the most common and easy-to-do method of dealing with missing covariate data. However, there are always serious warnings in using this method. There should be concern to make imputed values more proper. In this paper, proper imputation from posterior predictive distribution is developed for implementing with arbitrary priors. We use empirical distribution of the posterior for approximating the posterior predictive distribution, to sample from it. This method is preferable in comparison with a presented imputation method of us which uses a full model to impute missing values using available software. The proposed methods are implemented on glucocorticoid data.

Fully Efficient Fractional Imputation for Incomplete Contingency Tables

  • Kang, Shin-Soo
    • Journal of the Korean Data and Information Science Society
    • /
    • v.15 no.4
    • /
    • pp.993-1002
    • /
    • 2004
  • Imputation procedures such as fully efficient fractional imputation(FEFI) or multiple imputation(MI) can be used to construct complete contingency tables from samples with partially classified responses. Variances of FEFI estimators of population proportions are derived. Simulation results, when data are missing completely at random, reveal that FEFI provides more efficient estimates of population than either multiple imputation(MI) based on data augmentation or complete case analysis, but neither FEFI nor MI provides an improvement over complete-case(CC) analysis with respect to accuracy of estimation of some parameters for association between two variables like $\theta_{i+}\theta_{+i}-\theta_{ij}$ and log odds-ratio.

  • PDF

Comparing Imputation Methods for Doubly Censored Data

  • Yoo, Han-Na;Lee, Jae-Won
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.3
    • /
    • pp.607-616
    • /
    • 2009
  • In many epidemiological studies, the occurrence times of the event of interest are right-censored or interval censored. In certain situations such as the AIDS data, however, the incubation period which is the time between HIV infection and the diagnosis of AIDS is usually doubly censored. In this paper, we impute the interval censored HIV infection time using three imputation methods. Mid imputation, conditional mean imputation and approximate Bayesian bootstrap are implemented to obtain right censored data, and then Gibbs sampler is used to estimate the coefficient factor of the incubation period. By using Bayesian approach, flexible modeling and the use of prior information is available. We applied both parametric and semi-parametric methods for estimating the effect of the covariate and compared the imputation results incorporating prior information for the covariate effects.

Technical Trends of Time-Series Data Imputation (시계열 데이터 결측치 처리 기술 동향)

  • Kim, E.D.;Ko, S.K.;Son, S.C.;Lee, B.T.
    • Electronics and Telecommunications Trends
    • /
    • v.36 no.4
    • /
    • pp.145-153
    • /
    • 2021
  • Data imputation is a crucial issue in data analysis because quality data are highly correlated with the performance of AI models. Particularly, it is difficult to collect quality time-series data for uncertain situations (for example, electricity blackout, delays for network conditions). Thus, it is necessary to research effective methods of time-series data imputation. Many studies on time-series data imputation can be divided into 5 parts, including statistical based, matrix-based, regression-based, deep learning (RNN and GAN) based methodologies. This study reviews and organizes these methodologies. Recently, deep learning-based imputation methods are developed and show excellent performance. However, it is associated to some computational problems that make it difficult to use in real-time system. Thus, the direction of future work is to develop low computational but high-performance imputation methods for application in the real field.