A common problem with self-reported data is that of "heaping", where "round" numbers appear with higher frequency than would be expected under typical distributional assumptions. In smoking cessation studies, for example, it is common to see daily cigarette count data having excess counts of 10, 15, 20, etc.
To correctly extract inferences on the underlying unheaped observations from the heaped, one needs to know the heaping mechanism, or probability distribution of heaped given unheaped values. We are in possession of a smoking cessation dataset in which both time-line follow-back (TLFB - delayed recall) and ecological momentary assessment (EMA - recording events as they occur) were used to record subjects' daily counts. With this dataset, we have robustly estimated a parametric "proximity/gravity" model for the heaping mechanism.
In this talk we describe methods for applying the information from the estimated proximity/gravity model to estimate the distribution of the underlying unheaped data from a clinical trial dataset in which only heaped values are available. We compare a method that assumes only the form of the model to a method that uses the actual model estimates.
|