Selective Editing Using Contamination Model

Ieva Burakauskaitė (ieva.burakauskaite@stat.gov.lt)
Statistics Lithuania
Vilma Nekrašaitė-Liegė (vilma.nekrasaite-liege@stat.gov.lt, vilma.nekrasaite-liege@vilniustech.lt)
Statistics Lithuania, Vilnius Gediminas Technical University

Abstract

Results of an outlier detection study with a focus on selective editing are presented in the paper. The aim of selective editing is to identify observations affected by errors that have a major impact on the quality of sample estimates. This way the data editing process can be focused on the corresponding observations therefore allocating excess human resources and reducing time costs though maintaining the quality of sample estimates. These objectives are especially important for national statistical institutions such as Statistics Lithuania seeking to optimize the data editing process.
A few different versions of selective editing were applied to the data editing process of the quarterly statistical survey on service enterprises (turnover indicator) of Statistics Lithuania. Predictions of the target variable were obtained using the contamination model. An impact of a potential error on a sample estimate was evaluated using a score function with a standard structure – a difference between the observed value of the target variable and its prediction multiplied by a sample weight and a suspicion component. Two types of the suspicion component (discrete and continuous) were used and an impact of the suspicion component on the effectiveness of selective editing was investigated. Efficiency of the continuous suspicion component supported its advantage over the discrete suspicion component, and therefore turned out to be a major factor in optimizing the data editing process.
Keywords: selective editing; contamination model; data validation; statistical survey; official statistics.

[Full Text]

Romanian Statistical Review 1/2022