Two main uses of R in Statistics Portugal: Estimation and confidentiality

Pedro Miguel Cardoso de Sousa (pedro.sousa@ine.pt)
Statistics Portugal, Instituto Nacional de Estatística – Delegação Porto,
Inês Rodrigues (ines.rodrigues@ine.pt)
Statistics Portugal, Instituto Nacional de Estatística – Sede
Maria da Conceição Ferreira (maria.ferreira@ine.pt)
Statistics Portugal, Instituto Nacional de Estatística – Delegação Porto
Pedro Campos (pedro.campos@ine.pt)
Statistics Portugal, Instituto Nacional de Estatística – Delegação Porto

Abstract 

R has been used in Statistics Portugal since more than 15 years and its use is currently widespread throughout the organization. In this paper, we focus on the use of R within the Statistical Methods Unit, where there are two main areas of R usage: estimation and disclosure control.
For many of our estimation procedures, R is applied as a primary tool: we make use of packages such as RODBC for database access and Survey for data analysis on complex survey samples. With regard to statistical disclosure control, the use of R in Statistics Portugal is intense, given the recent developments concerning packages for protecting the confidentiality of microdata and tabular data. R package sdcMicro has been a valuable tool in estimating disclosure risk concerning different intruder scenarios in a quick and friendly manner. R has also played a central role in studying and developing techniques for producing Public Use Files for the Household Budget Survey: parametric and non-parametric methods have been compared regarding their capacity to generate safe and useful synthetic data. With respect to Census data, perturbative methods for table protection have been developed, which included writing R functions to check for two priorities when analyzing usefulness: table consistency and additivity. Besides applying R at the Unit, we encourage its use in Statistics Portugal through systematic four-day courses covering some basic commands and more intermediate features. These allow ever more users to manage, analyze and visualize data using R.

Keywords: R software, official statistics, estimation, confidentiality
JEL Classification: C80, C83, C89

[Full Text]

Romanian Statistical Review 4/2018