Yukako Toko (ytoko@nstac.go.jp)
National Statistics Center, Japan
Mika Sato-Ilic (mika@risk.tsukuba.ac.jp)
Institute of Systems and Information Engineering, University of Tsukuba, Japan
Abstract
Autocoding plays an essential role in editing official statistics data, and here we have proposed a classification method which is fundamental to a hybrid autocoding system. This system has the essential feature of combining a rule-based classification method and a machine learning based classification method, in order to lead the coding task of the Family Income and Expenditure Survey. It is known that shopping receipt image data causes difficulty when using only the rule-based part of the proposed system, due to the complexity of the given data. Therefore, including a variety of criteria in the evaluation of the classification results for the shopping receipt images is essential for obtaining correct data. For this reason, this paper presents surveyed results of the various criteria for the shopping receipt image data based on the hybrid autocoding system. As a result, we found that the machine learning based classification element of the autocoding system chiefly works for dealing with the shopping receipt image data. Additionally, due to the recent increase in quantity of shopping receipt image data, the importance of the machine learning based classification method grows. Moreover, based on the results of various criteria, the existence of a dynamical sensitivity over the times was found. This may direct us to future developments in the machine learning-based classification method, such as fuzzy clustering based support vector machine currently in development (Toko and Sato-Ilic, 2021, Toko and Sato-Ilic, 2022).
Keywords: Coding, Evaluation metrics, Fuzzy logic, Text classification
JEL Classification: C38