Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 40 - Survey Weighting, Imputation, and Estimation
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Government Statistics Section
Abstract #313878
Title: Augmented CPS Data on Industry and Occupation
Author(s): Peter Meyer* and Kendra Asher
Companies: U.S. Bureau of Labor Statistics and U.S. Bureau of Labor Statistics
Keywords: imputation; prediction; random forest; CPS; industry; occupation
Abstract:

The Current Population Survey (CPS) classifies the jobs of respondents into hundreds of detailed industry and occupation categories. The classification systems change periodically, creating breaks in time series. Standard concordances bridge the periods, but often leave empty cells or inaccurate sharp changes in time series. Standard concordances also usually hold the assumption that a certain period of time can be representative, on more aggregate levels, of various historical periods.

For each employed CPS respondent classified under a previous classification method we apply prediction algorithms, principally random forests, to impute standardized industry, occupation, and related variables. The imputations use micro data about each individual and large training data sets about the population. In some of the training data sets, industry and occupation have been classified by specialists into two industry and occupation category systems – that is, they are dual-coded. We train the random forests classification method to handle the changes in classification between the 1990s and 2000s largely on the dual-coded data set and apply it to the full CPS and IPUMS-CPS to impute several variables including industry and occupation. For other changes in classification, for example when an industry or occupation splits, we train the algorithms on the observations with the newly classified industry or occupation split, to predict how the historical observations would have been classified. We test the industries and occupations in the resulting augmented data sets for smooth population proportions and wage levels and for how well they match known trends, benchmarks, and alternative data sources. Augmented data sets of this kind can serve research on many topics.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program