Abstract:
|
Estimating the number of drug users is both an important and a difficult statistical task. Knowing how many drug users there are is essential for monitoring trends in drug use prevalence over time and for designing efficient intervention programs. But how do we estimate the size of a hidden population? A commonly chosen approach is capture-recapture modeling where several lists of drug users (e.g. from health- and criminal records) are matched and compared, thereby allowing for estimating the size of the unknown population. However, the capture-recapture strategy produces several different possible estimates depending on what variables are included in the model, and this is of little use for policy makers who need a single best estimate to base decisions on. Therefore, choosing which model to rely on is both unavoidable and essential in this context. We discuss several approaches to addressing variable selection, including selection based on an information criterion, stability selection, bagging and model averaging. The methods are applied to Danish administrative data concerning drug users, thereby allowing us to estimate the size of the unknown drug user population in Denmark.
|