Online Program Home
My Program

Abstract Details

Activity Number: 341 - SPEED: Classification and Data Science
Type: Contributed
Date/Time: Tuesday, July 31, 2018 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #330933
Title: Efficient Big Data Model Selection with Applications to Fraud Detection
Author(s): Gregory Vaughan*
Companies: Bentley University
Keywords: big data; stagewise estimation; sub-sampling; clustered data; fraud detection

As the volume and complexity of data continues to grow, more attention is being focused on solving so-called big data problems. One field where this focus is pertinent is credit card fraud detection. Model selection approaches can identify key predictors for preventing fraud. Stagewise Selection is a classic model selection technique that has experienced a revitalized interest due to its computational simplicity and flexibility. Over a sequence of simple learning steps, stagewise techniques build a sequence of candidate models that is less greedy than the stepwise approach.

This paper introduces a new stochastic stagewise technique that integrates a sub-sampling approach into the stagewise framework, yielding a simple tool for model selection when working with big data. Simulation studies demonstrate the proposed technique offers a reasonable trade off between computational cost and predictive performance. We apply the proposed approach to synthetic credit card fraud data to demonstrate the technique's application.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program