Estimation of Number of Persons Per Household Based on Characteristics of Consumption Items – utilization of big-data to improve the Consumption Trend Index in Japan-

Anri Mutoh (
National Statistics Center, Japan
Masayo Yamashita (
National Statistics Center, Japan
Yoshiyasu Tamura (
National Statistics Center, Japan
Masahiro Matsumoto (
National Statistics Center, Japan


The article suggests the possibility of utilizing big-data held by companies, integrating it with the data of official statistics. Official statistics agencies in Japan have sought to develop a Consumption Trend Index (CTI) by cooperating with academic researchers and companies as a provider of the big-data. One of the important roles of the CTI is to more accurately indicate the trend of one-person household consumption, therefore, the big-data is expected to reinforce existing official micro-data, especially one-person household. However, the obtainable big-data seldom includes the number of household members, and needs imputation of the missing value. Therefore, we estimate the number of members in each household according to the characteristics of consumption items in the Japanese traditional household expenditure survey. We used logistic regression with an L1 penalty (Lasso regression) for the analysis, with each type of household as the response variable and purchase items as the explanatory variable. As a result, since one-person households and two-or-more-person households are identified by their purchasing tendencies, so the household characteristic become evident.

Keywords: Consumption trend, household accounts, statistical imputation, logistic regression, LASSO, R package ‘glmnet’
JEL classification: D13, D16, D90, P44, Z13

[Full Text]

Romanian Statistical Review 4/2019