Online Program

Use of auxiliary information at the sampling and estimation stages in surveys

*David Haziza, Université de Montréal

Keywords:

Auxiliary information plays an essential role in surveys. It can be used to reduce sampling errors as well as nonsampling errors (such as coverage errors, nonresponse errors, etc). In the course, the focus will be placed on the sampling errors. When an auxiliary variable is available for all the population units prior to sampling, it can be used at the design stage in order to stratify the population or as a size measure in the context of inclusion probability proportional to size sampling designs. When an auxiliary variable is observed for the sampled units only but its population total (or mean) is known, it can be used at the estimation stage to derive a set of weights that incorporate the auxiliary information.

The course will be divided into two parts: in the first part, we consider the use of auxiliary variables at the design stage through balanced sampling, which consists of selecting random samples while satisfying appropriate balancing equations. We first show that commonly used sampling designs can be viewed as a special case of balanced sampling. We then present two families of procedures for selecting a balanced sample: rejective sampling and the cube algorithm.

In the second part, we present the calibration approach, which has been a central technique for the last decades in statistical agencies. Calibration consists of reweighting the sampled units so that survey estimates of totals (or means) coincide with known population totals, available from external sources. We discuss two distinct calibration methods: (i) the minimum distance method and (ii) the instrument vector method. For method (i), the main ingredients for calibration are the vector of auxiliary information attached to each sampled units as well as the distance function used to measure the proximity between pre-calibrated weights and calibrated weights. For method (ii), the main ingredients are the vectors of auxiliary information attached to each sampled units and a vector of instrumental variables.

In establishment surveys, the sample is typically selected according to stratified simple random sampling without replacement or according to a stratified two-stage design. In the latter case, the primary sampling units may consist of the businesses and the employees may be the secondary sampling units. Calibration is an important part of the weighting process in establishment surveys. Auxiliary information in establishment surveys includes variables available on the Business Register (BR) and administrative data. Throughout the course, there will be several numerical illustrations using the SAS software. In particular, example using data from the Workplace and Employee Survey will be discussed

About the instructors:

David Haziza is an Associate Professor at Université de Montréal in the Department of Mathematics and Statistics. Previously he worked as a methodologist at Statistics Canada, where he still collaborates part-time as a consultant. His recent work focuses on doubly robust inference in the presence of missing data and robust inference in the presence of influential units.