Abstract:
|
The posterior predictive distribution (the distribution of data simulated from a model) has been used to flag model-data discrepancies in the Bayesian literature. The approach taken here differs from others both conceptually and as realized. It compares the "distance" between the data and model (as represented by pseudo-data simulated from the model) with "distance" within the model. The latter is calculated by generating pseudo-data from the, using each set of these pseudo-data to re-estimate the model, and then generating pseudo-data from each of them. "Distances" are calculated as the log of sums-of-squares, following ranking of the original data vs. psuedo-data, or psuedo-data vs. psuedo-data. The test compares a mean data-model distance to a "null" distribution of mean distances between pseudo-data. The power of this method compares favorably with t-tests and can be used for most models in the GLMM framework, whether estimated using traditional or Bayesian methods. A new kind of plot, where the distribution of the ranked pseudo-data is compared to the original data at each ranked datum, is useful for determining the region of the data where the model fails.
|