81 – Geologic, Atmospheric, and Weather-Related Events
Simplified Census Edit and Imputation Based on Statistical Principles
Robert Sands
U.S. Census Bureau
The U.S. Census system for editing and imputing demographic characteristics of persons in households has worked admirably since the 1960 Census. The current study describes a statistically principled random imputation of the relationship, age and sex (ras) characteristics (items) of 2010 Census household (hh) persons. This system imputes the ras items for all persons in a hh simultaneously. The imputation is independent of any set of edit rules. After identifying the full set of edit rules, the vectors of the ras values of each person in a hh that passed all edits are identified. Probability distributions of valid hh vectors for all completely classified US hhs up to 8 persons is produced. Next, the EM algorithm distributes the partially classified household vectors to the completely classified distribution to produce maximum likelihood estimates. To impute the missing items in a hh, a random draw is made from the distribution of vectors that match the partially classified hh's reported variables. A truth deck of persons, matched between the census and the CCM survey and with missing census items but reported CCM survey items, is used to calculate measures of agreement.