Abstract:
|
Several statistical issues associated with health care costs, such as heteroscedasticity and severe skewness, make statistical analysis challenging. When the interest is modeling the mean cost, it is therefore desirable to make no assumption on the density function or higher order moments. On the other hand, when there are a large number of covariates, variable selection is needed to achieve a balance of prediction accuracy and model simplicity. We propose spike-or-slab priors for Bayesian variable selection, which are consistent as long as the assumption on the mean cost is satisfied. This method possesses three advantages simultaneously: robustness (due to avoiding assumptions on density function or higher order moments), parsimony (thanks to variable selection), and expressiveness (due to its Bayesian flavor, which can compare posterior probabilities of the candidate models). In addition, by ranking the Z-statistics, the scope of model searching can be reduced to achieve computational efficiency. We apply this method to the Medical Expenditure Panel Survey dataset.
|