A Statistical Matching Method with k-NN and Regression

  • Published : 2007.11.30

Abstract

Statistical matching is a method of data integration for data sources that do not share the same units. It could produce rapidly lots of new information at low cost and decrease the response burden affecting the quality of data. This paper proposes a statistical matching technique combining k-NN (k-nearest neighborhood) and regression methods. We select k records in a donor file that have similarity in value with a specific observation of the common variable in a recipient file and estimate an imputation value for the recipient file, using regression modeling in the donor file. An empirical comparison study is conducted to show the properties of the proposed method.

Keywords