Abstract:
|
Accurate state-level surveillance of diabetes and prediabetes is paramount, but most states do not have one definitive data source for accurate prevalence estimation, especially for undiagnosed cases. We present a two-stage approach for combining estimates from various sources, including nationally representative surveys, state representative surveys, non-representative surveys, and administrative and clinical archives. Challenges posed by these data sources include non-representativeness, non-overlapping frames, and missingness not at random. First we use techniques including raking and propensity score weighting to reduce the bias of each data source. Then we create a composite estimate, where source estimates are weighted inversely proportional to their mean squared error. The variance of our final estimate includes sampling errors and the estimated unknown biases, ensuring our combined estimate is not overwhelmed by large, unrepresentative data sources. Using California as a case study, our estimate of self-reported diabetes prevalence is 7.6%, compared to the BRFSS California estimate of 10.2%. We find 3.6% undiagnosed diabetes, resulting in a total estimate of 11.2%.
|