Integration and imputation of survey data in R: the StatMatch package

Marcello D’Orazio (
Italian National Institute of Statistics (Istat)


Statistical matching methods permit to integrate two or more data sources with the purpose of investigating the relationship between variables not jointly observed. Recently these methods received much attention as valid alternative to produce new statistical outputs.
The paper provides an overview on the statistical matching methods implemented in the package StatMatch for the R environment, focusing on the most widespread methods and how they were improved. Particular attention is devoted to hot deck matching methods, strictly related to the ones developed for the imputation of missing values. The corresponding functions in StatMatch are very powerful and are flexible enough to be applied for imputing missing values in a survey. The paper tackles also the problem of matching data from complex sample surveys, a very important topic in National Statistical Institutes. Finally it is described the concept of uncertainty characterizing the statistical matching framework and how this alternative approach can be exploited for different purposes.

[Full Text]

Romanian Statistical Review 2/2015