Abstract:
|
More than one procedural path (involving data smoothing, imputation, outlier removal, etc.) may be available preparatory to a parametric inference with little certainty about the best choice for the given data and planned statistical model. We propose a statistical learning decision rule, an "oracle", to recommend the best procedure in such cases. This oracle is methodically tailored to the observed data, the chosen statistical model, and the set of procedure(s) available to prepare the data. We use artificial neural networks (ANNs) to learn the decision boundary between the competing procedure(s), trained on an innovative synthetic data set constructed solely from model parameters with high posterior probability and with no additional assumptions. The oracle's performance is studied in two estimation problems: simple linear regression (SLR) slope estimation and change-point estimation in a continuous piecewise-linear regression (CPLR). In each of these examples the regression response is given to be increasing, and the oracle is tasked to decide whether the pool-adjacent-violators algorithm should be applied preparatory to fitting the model. An intuitive measure of potential performance called oracle headroom is used to comprehensively explore the oracle’s potential for reducing estimation standard error in the SLR and CPLR problems. We find for specific problem configurations both that the oracle’s headroom is high and that in statistical experiments the oracle’s empirical performance is near its headroom, offering clear benefit.
|