Abstract:
|
A problem sometimes seen in self-reported data is that of "heaping", where certain numerals (often multiples of 2, 5, 10 or 12) appear with higher frequency than would be expected. In smoking cessation studies, for example, it is common to see daily cigarette count data having excess counts of 10, 20, 30, etc. A smoking cessation dataset where both time-line follow-back (TLFB - delayed recall) and ecological momentary assessment (EMA - recording events as they occur) were used to record subjects' cigarettes smoked per day presents a unique opportunity to examine and model the mechanism of heaping. This double-coded dataset allows us to see the impact of heaping, notoriously an issue with the TLFB method, and to attempt to model the heaped data (TLFB) based on the "real" data (EMA). In the novel "proximity and gravity" model, the conditional probability of reporting value is based on a measure of proximity (of the truth to the reported value) and a vector of gravities (defining the intrinsic attractiveness of each possible reported value). In this project, we explore possible variants of the proximity-gravity model to determine which provides the best fit to the TLFB-EMA data.
|