Abstract:
|
In order to facilitate research into specific subgroups, survey statisticians use oversampling to disproportionally select individuals belonging to the subgroups of interest. Oversampling allows for an increased sample size of the subgroup and thus more precise estimation. However, it is often unknown whether an individual in a sample belongs to the subgroup of interest. In this case, we can build a probabilistic logistic regression model to estimate the probability of group membership, and use the estimated probabilities to oversample. In this paper, we outline a case study where we apply regression-based probabilistic sampling to oversample for individuals who own mutual funds. Using AmeriSpeak data, we develop a probabilistic model by regressing demographic characteristics that are predictive of an individual being a mutual fund owner. These characteristics include age, gender, income, education, race, and others. This paper documents the specifications and accuracy of the probabilistic model and the sampling results. After oversampling based on these estimated probabilities, we were able to successfully achieve the desired proportion of mutual fund owners in the sample.
|