Online Job Advertisements for Labour Market Statistics using R

Eurostat, European Commission, Luxembourg
Gabriele MARCONI
Sogeti, Luxembourg
Eurostat, European Commission, Luxembourg
Fernando REIS
Eurostat, European Commission, Luxembourg


This paper introduces the implementation through R of the methodology used to calculate a labour market concentration (Herfindahl-Hirschman) index for European urban areas, based on a database of over 100 million online job advertisements. After introducing the broader context and the motivation for the analysis, the authors describe the overall processing workflow. In addition, the paper presents in more detail the solutions provided to two main challenges encountered: addressing computational efficiency by using parallel computing and cloud data querying; and a custom-built machine learning model to classify an important variable for the study (company name). Finally, the paper discusses the main rationales for using R and for sharing the code in a public repository.
Keywords: R, Big data, Online Job Advertisements, Labour market

Romanian Statistical Review 1/2022