Bogdan OANCEA, Ph.D. (email:bogdan.oancea@faa.unibuc.ro)
University of Bucharest
David SALGADO, Ph.D. (email: david.salgado.fernandez@ine.es)
Dept. Methodology and Development of Statistical Production, INE, Spain
Antoniade Ciprian ALEXANDRU, PhD (email: alexcipro@yahoo.com)
Ecological University of Bucharest and National Statistics Institute of Romania
Luis SANGUIAO, Ph.D. (luis.sanguiao.sande@ine.es)
Dept. Methodology and Development of Statistical Production, INE, Spain
Abstract
Integration of the mobile phone data in the production of official statistics was one of the main goals of the ESSnet Big Data project. In this regard, we developed an R package to compute population estimates following a methodology inspired from the ecological sampling techniques. This methodology uses a Bayesian approach that is computationally intensive but allows a straightforward code parallelization. Since some functions of our package are very demanding from a computational point of view, we implemented them in C++ and integrated with the rest of the package using Rcpp. More, the C++ code is also parallelized, and we chose RcppParallel package for this purpose. The estimation procedure combines mobile phone data sets with another data source which can be a population register and produces estimates for each territorial division of a geographical area and along a sequence of time instants for which we have data from Mobile Network Operators. One of the hypotheses of the underlying mathematical model was the independence of the estimates between different cells that allowed us to use a parallel procedure to perform computations for each cell. Besides population estimates, it provides a set of accuracy indicators. pestim was developed with an eye on portability and it can be used on both Windows and Unix-like operating systems. We tested our package and showed that it has a good scalability that is an essential characteristic for very large data sets.
Keywords: R, big data, mobile phone data
JEL Codes: C11, C15, C55, C63, C88