350 – Administrative Data, Record Linkage, and Latent Class Models
Combining Cluster Sampling and Link-Tracing Sampling: Estimating the Size of a Hidden Population in the Presence of Heterogeneous Link-Probabilities Modeled by a Latent-Class Model
Jesus Armando Dominguez-Molina
Autonomous University of Sinaloa
Martin Felix-Medina
Autonomous University of Sinaloa
In this work we proposed estimators of the size of a hidden population, such as sexual workers and drug users. Specifically, we derive unconditional and conditional maximum likelihood estimators to be used along with the variant of link-tracing sampling proposed by Flix-Medina and Thompson (Jour. Official Stat., 2004). In this variant, a sampling frame made up by sites where the members of the population can be found with high probabilities, such as bars and parks, is constructed. The population is not assumed to be completely covered by the frame. Then an initial simple random sample of sites is selected from the frame. The people in the sampled sites are identified and they are asked to name other members of the population. We say that there is a link between a site and a person if that person is named by at least one element in the site. Following an idea used by Pledger (Biometrics, 2000) in the context of capture-recapture, we derived maximum likelihood estimators under the assumption that the elements in the population can be grouped into a number of classes according to their susceptibility of being linked to a site in the initial sample. Elements in the same class have the same probability of being linked to a particular site, while elements in different classes have different link probabilities. This assumption allows us to model the heterogeneity of the link probabilities. The unconditional maximum likelihood estimator is obtained by using the ordinary maximum likelihood approach, whereas the conditional maximum likelihood estimator is obtained by using an approach proposed by Sanathanan (Annals of Math. Stat., 1972). The results of a simulation study indicate that the proposed estimators require relatively large sampling fractions to perform satisfactorily, otherwise they present problems of high variability and numerical instability.