Abstract:
|
Stratified sampling study designs are commonly implemented for inferences regarding rare subpopulations that might be missed under simple random sampling designs. However, for analyses, the sampling scheme must be acknowledged appropriately to ensure valid and efficient inferences of population parameters. In many circumstances, stratum definitions are effectively measured without error and so generalization from the sample to the population using survey-weighting techniques is common. However, when the sampling frame and stratum definitions are based on electronic health records (EHRs) that are measured with error, stratum definitions and therefore the target population itself is unclear. The eMERGE CERC Survey Study seeks to understand patient characteristics associated with agreeing to participate in biobank research studies; however, sampling from this population was based on strata that were defined, in part, on EHR data. We propose analytical approaches to acknowledge errors in stratum definitions, and because the population itself is unknown, we propose sensitivity analyses to examine the extent to which the stratum definition errors impact inferences of population targets.
|