Online Program Home
My Program

Abstract Details

Activity Number: 145 - Big Data Statistical Challenges and Opportunities in Industry
Type: Invited
Date/Time: Monday, July 30, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Consulting
Abstract #326586
Title: Feature Engineering from Scratch
Author(s): Andrew Smith*
Companies: Google
Keywords: Feature Engineering

Regression and classification methods commonly assume as a starting point a N x p data matrix, where p is a fixed number of predictors. In many industrial applications of these methods, however, the starting point looks considerably different: for each user or customer, we have a variable-length set of events (e.g., credit card or banking transactions, or web activity events). Thus, there is an implicit first step in the model-building process of aggregating or summarizing these events in order to define useful predictors, and this step commonly receives little attention compared to variable selection and model tuning. In this talk, we discuss the challenges inherent to this aggregation step, and present real-world examples where careful attention to aggregation has yielded surprising improvements to model performance.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program