Investigations into Coarsening Continuous Variables Jeong, Dong-Myeong; Kim, Jay-J.;
Protection against disclosure of survey respondents' identifiable and/or sensitive information is a prerequisite for statistical agencies that release microdata files from their sample surveys. Coarsening is one of popular methods for protecting the confidentiality of the data. Grouped data can be released in the form of microdata or tabular data. Instead of releasing the data in a tabular form only, having microdata available to the public with interval codes with their representative values greatly enhances the utility of the data. It allows the researchers to compute covariance between the variables and build statistical models or to run a variety of statistical tests on the data. It may be conjectured that the variance of the interval data is lower that of the ungrouped data in the sense that the coarsened data do not have the within interval variance. This conjecture will be investigated using the uniform and triangular distributions. Traditionally, midpoint is used to represent all the values in an interval. This approach implicitly assumes that the data is uniformly distributed within each interval. However, this assumption may not hold, especially in the last interval of the economic data. In this paper, we will use three distributional assumptions - uniform, Pareto and lognormal distribution - in the last interval and use either midpoint or median for other intervals for wage and food costs of the Statistics Korea's 2006 Household Income and Expenditure Survey(HIES) data and compare these approaches in terms of the first two moments.
Jeong, D. M. (2008). Schemes for masking the household income and expenditures survey data, Internal Memorandum, Statistics Korea.
Johnson, N. L. and Kotz, S. (1970). Distribution in Statistics, Continuous Univariate Distributions-1, John Wiley and Sons.
Kim, J. J. (2008). Probability of Falling in Intervals and Sum of Squares. U.S. National Center for Health Statistics Internal Memorandum.
Kim, J. J., Katzoff, M., Gonzalez, Jr. J. F. and Cox, L. H. (2004). Effects of grouping on first and second distribution moments, 2004 Proceedings of the Survey Research Methods Section, American Statistical Association, 3808-3815.
Statistics Korea (2006). Household Income and Expenditure Survey.
Sturges, H. A. (1926). The choice of a class interval, Journal of the American Statistical Association, 21, 65-66.