Abstract:
|
Occupation coding refers to the assignment of respondents' textual answers from surveys into an official occupational classification. It is time-consuming and expensive if done manually and, as a remedy, several algorithms have been suggested to automate this process. To overcome deficiencies of existent techniques and to provide probabilistic predictions, we introduce yet another method that combines training data from previous studies and job titles from a coding index. Using data from various German surveys, we compare our new method with some of the main algorithms described in the literature, including regularized logistic regression, gradient boosting, nearest neighbors, memory-based reasoning, and string similarity. Strengths and weaknesses of each algorithm are discussed.
|