Online Program Home
My Program

Abstract Details

Activity Number: 43 - SPEED: Statistics in Sports; Physical Activity/Sleep Studies, and Nonparametrics Part 1
Type: Contributed
Date/Time: Sunday, July 28, 2019 : 2:00 PM to 3:50 PM
Sponsor: Section on Nonparametric Statistics
Abstract #306525
Title: Faint Galaxies Detection: An Example of Guided Follow-Up with Imbalanced Data Sets
Author(s): Niccolo Dalmasso* and Ann B. Lee and Rafael Izbicki
Companies: Carnegie Mellon University and Carnegie Mellon University and Federal University of Sao Carlos
Keywords: Classification; Imbalanced data; Follow-up strategies; Nonparametric statistics; Astrostatistics

Our science goal is to identify very faint galaxies that occur at low redshifts (where redshift is a proxy for distance to the observer). Given candidate galaxies, the astronomers then aim to follow-up with higher-resolution spectroscopy. As these follow-up studies are expensive and limited in size by experimental design, there is a tradeoff between cost and recall. In addition to the cost-recall tradeoff, there is also the statistical challenge of correctly classifying objects for imbalanced data with very few actual positives, and the problem of uncertainty quantification in settings with low signal-to-noise ratio and degenerate solutions. In this work, we develop algorithm-agnostic budget-aware strategies for selecting follow-up candidates, as well as strategies for data augmentation and nonparametric conditional density estimation for classification with imbalanced data. Although our main application is in astronomy, our proposed methods apply generally to detection problems in, e.g., credit card fraud and medical diagnosis, involving few actual positives and limited (monetary or time-wise) budget for collecting new data.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program