Abstract:
|
Regression and classification methods commonly assume as a starting point a N x p data matrix, where p is a fixed number of predictors. In many industrial applications of these methods, however, the starting point looks considerably different: for each user or customer, we have a variable-length set of events (e.g., credit card or banking transactions, or web activity events). Thus, there is an implicit first step in the model-building process of aggregating or summarizing these events in order to define useful predictors, and this step commonly receives little attention compared to variable selection and model tuning. In this talk, we discuss the challenges inherent to this aggregation step, and present real-world examples where careful attention to aggregation has yielded surprising improvements to model performance.
|