Online Program Home
  My Program

Abstract Details

Activity Number: 356 - Contributed Poster Presentations: Survey Research Methods Section
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 10:30 AM to 12:20 PM
Sponsor: Survey Research Methods Section
Abstract #323072
Title: Three Methods for Occupation Coding Based on Statistical Learning
Author(s): Matthias Schonlau* and Hyukjun Gweon and Lars Kaczmirek and Michael Blohm and Stefan Steiner
Companies: University of Waterloo and and GESIS and GESIS and University of Waterloo
Keywords: Occupation coding ; machine learning ; statistical learning ; ISCO-88 ; ALLBUS
Abstract:

Occupation coding, an important task in official statistics, refers to coding a respondent's text answer into one of many hundreds of occupation codes. To date, occupation coding is still at least partially conducted manually, at great expense. We propose three methods for automatic coding: combining separate models for the detailed occupation codes and for aggregate occupation codes, a hybrid method that combines a duplicate-based approach with a statistical learning algorithm, and a modified nearest neighbor approach. Using data from the German General Social Survey (ALLBUS), we show that the proposed methods improve on both the coding accuracy of the underlying statistical learning algorithm and the coding accuracy of duplicates where duplicates exist. Further, we find defining duplicates based on ngram variables (a concept from text mining) is preferable to one based on exact string matches.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association