Online Program Home
My Program

Abstract Details

Activity Number: 127 - SPEED: Statistical Learning and Data Science Speed Session 1, Part 1
Type: Contributed
Date/Time: Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #305177 Presentation
Title: To Select or Not to Select? Variable Selection in the Estimation of Drug Use Prevalence in Denmark
Author(s): Anne Helby Petersen* and Niels Keiding
Companies: University of Copenhagen and University of Copenhagen
Keywords: capture-recapture; hidden populations; variable selection; model averaging; drug users; applied statistics

Estimating the number of drug users is both an important and a difficult statistical task. Knowing how many drug users there are is essential for monitoring trends in drug use prevalence over time and for designing efficient intervention programs. But how do we estimate the size of a hidden population? A commonly chosen approach is capture-recapture modeling where several lists of drug users (e.g. from health- and criminal records) are matched and compared, thereby allowing for estimating the size of the unknown population. However, the capture-recapture strategy produces several different possible estimates depending on what variables are included in the model, and this is of little use for policy makers who need a single best estimate to base decisions on. Therefore, choosing which model to rely on is both unavoidable and essential in this context. We discuss several approaches to addressing variable selection, including selection based on an information criterion, stability selection, bagging and model averaging. The methods are applied to Danish administrative data concerning drug users, thereby allowing us to estimate the size of the unknown drug user population in Denmark.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program