Abstract:
|
Model selection can be used to select key features for data in a variety of fields. In the case of many different variables, it is of interest to incorporate cost constraints when selecting a model and planning future data collection. Cost may refer to the financial, logistical, computational, or ethical burden of collecting data and fitting a model with particular variables. We develop Bayesian model selection with cost constraints for linear and hierarchical models by adjusting model priors based on available cost and other constraint information. We provide Bayesian posterior model probabilities that can be used in model averaging and classification problems. We empirically explore the tradeoff in cost of labeling additional data points versus adding covariates to an analysis. We study properties of model priors that aim to incorporate cost constraints and we use simulations, data cloning, and distance measures to assess how increasing sample size affects the performance of cost constraints and priors. We apply cost-constrained model priors to a data set pertaining to the presence of heart disease, and we compare results to those obtained using uniform model priors.
|