Abstract:
|
Ideally, it is possible to make inferences on a population using a single survey. This requires a well-developed survey design, such that the collection of responses obtained are representative of the population of interest. Unfortunately, obtaining single-source data can be challenging. In these cases, data integration is necessary. Methods exist for combining information sources when each source is representative of the population, however in cases where some datasets are non-representative, data integration becomes notably more complicated. In this talk we discuss methods for data integration involving categorical variables when some data sources are representative and others are only conditionally representative, i.e. can be considered representative samples from some conditional distribution involving the survey variables. Our method leverages Dirichlet process mixtures of products of multinomials to parsimoniously and flexibly model the dependencies among variables. We present simulation studies illustrating problems that arise when all data sources are treated as representative and show how these can be alleviated.
|