Keywords: Cost structures, Data quality, Management of complex supply chains, Non-designed data, Organic data, Risk management, Total survey error model
Government agencies and other statistical organizations historically have used sample surveys as the primary basis for production of statistical series that provide important information in, e.g., demographics, economics, public health and education. In recent years, cost constraints and increased requests for highly granular statistical information have led to increased consideration of statistical production based on the integration of surveys with non-survey data sources (sometimes described as “non-designed data,” “organic data” or “big data”). This paper outlines a relatively general approach to the design of the resulting production process, with emphasis on four elements:
(1) Objective functions that account for multiple dimensions of quality, risk and cost that are important for the long-term performance of a statistical organization;
(2) Multiple design factors that affect the distributions of the objective functions in (1), including data sources, methodology, production systems and management;
(3) Environmental features that may also affect the objective functions in (1), but are largely outside the control of the statistical organization; and
(4) Availability of information related to each of factors (1)-(3).
Special emphasis is placed on ways in which traditional concepts from survey design and experimental design can be extended to the broader context defined by elements (1)-(4). The main ideas of this paper are motivated by, and illustrated with, two examples.