Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 427 - Intelligent Systems and Decision Support
Type: Contributed
Date/Time: Wednesday, August 10, 2022 : 10:30 AM to 12:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323051
Title: A Statistical Learning Oracle to Support Decisions in Inference
Author(s): Lucas Koepke* and Mary Gregg and Michael Frey
Companies: National Institute of Standards and Technology and National Institute of Standards and Technology and National Institute of Standards and Technology
Keywords: machine learning; synthetic data; statistical analysis; parameter estimation; isotonic regression
Abstract:

More than one procedural path (involving data smoothing, imputation, outlier removal, etc.) may be available preparatory to a parametric inference with little certainty about the best choice for the given data and planned statistical model. We propose a statistical learning decision rule, an "oracle", to recommend the best procedure in such cases. This oracle is methodically tailored to the observed data, the chosen statistical model, and the set of procedure(s) available to prepare the data. We use artificial neural networks (ANNs) to learn the decision boundary between the competing procedure(s), trained on an innovative synthetic data set constructed solely from model parameters with high posterior probability and with no additional assumptions. The oracle's performance is studied in two estimation problems: simple linear regression (SLR) slope estimation and change-point estimation in a continuous piecewise-linear regression (CPLR). In each of these examples the regression response is given to be increasing, and the oracle is tasked to decide whether the pool-adjacent-violators algorithm should be applied preparatory to fitting the model. An intuitive measure of potential performance called oracle headroom is used to comprehensively explore the oracle’s potential for reducing estimation standard error in the SLR and CPLR problems. We find for specific problem configurations both that the oracle’s headroom is high and that in statistical experiments the oracle’s empirical performance is near its headroom, offering clear benefit.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program