Online Program

Return to main conference page
Saturday, February 17
PS3 Poster Session 3 and Continental Breakfast Sat, Feb 17, 8:00 AM - 9:15 AM
Salons F-I

Thematic Feature Selection for Research Support (303677)

View Presentation View Presentation

*Thealexa Becker, Federal Reserve Bank of Kansas City 

Keywords: big data, microdata, R, machine learning, feature selection, variable selection, social science, public-use data

Social scientists often use large datasets like the Current Population Survey (CPS) and Survey of Income and Program Participation (SIPP) for research on topics such as labor markets, demographics, and macroeconomics. Selection of appropriate variables (features) for inclusion in statistical models is a critical activity in social science research that is painstaking and time-consuming. Typical approaches to feature selection are either (1) parsimonious selection of a minimal number of features a priori using subject matter expertise, or (2) inclusion of all or nearly all variables with little attention paid to selection. Both approaches have shortcomings and using datasets such as the CPS and the SIPP is vulnerable to those limitations. As such, this paper looks to data science techniques to create a structured application for feature selection on large datasets. These feature selection methods will be evaluated in terms of their ability to select important and relevant features and their impact on model performance using R.