Abstract:
|
In studying geographic variation in the prevalence of diabetes, small area estimation (SAE) techniques are used to estimate parameters of interest when sample sizes are too small to provide adequate direct estimates. To increase the effective sample size, SAE models borrow strength across areas using individual-level information on the outcome of interest and area-specific auxiliary variables (e.g., sex, age, income or education). However, between-area heterogeneity of auxiliary variables can introduce large variations in the models and diminish the accuracy of the estimates. To address this issue, we incorporated a k nearest neighbor algorithm (SAE-kNN) in a Bayesian hierarchical model. The SAE-kNN model identifies the k nearest neighbors for each small area based on the similarity of area-specific auxiliary variables, and estimates the parameters of interest by borrowing "strength" across neighbors. We applied the approach to estimate Puerto Rico municipio-level diabetes prevalence. Using the SAE-kNN method reduced the standard error of the estimates by 23.8%, on average. The approach could be generalized to estimate nationwide county-level prevalence.
|