ICES V is sponsoring two student contests: a data collection contest and an imputation (missing data treatment) contest. The student contests will create interest and innovation in the establishment survey field by inspiring students and the faculty they work with to create interesting and challenging applications that test their technical skill and creativity. The winner of each contest will be funded to attend the conference and present his/her research.
Congratulations to the winners of the ICES-V Student Contest:
Department of Statistics
Iowa State University
|Multivariate Regression Imputation Approach to the Analysis of Item Nonresponse in a Retail Trade Survey Data|
Zhonglei Wang and Hejian Sang
Department of Statistics
Iowa State University
|Nonparametric Bootstrap to Generate Synthetic Population to Handle Complex Missing Data Problems|
Julien Miron and Audrey-Anne Vallée
Institut de Statistique
Université de Neuchâtel
|Imputation Procedure and Inference In Presence of Imputed Data: Application To Industries|
The three papers will be presented by the authors in a dedicated session entitled “Imputing Multivariate Data from Two Business Populations: Results of the ICES-5 Student Contest” on Thursday, 23 June 2016 from 10:40 – 12:00.
Our objective in establishing the contest was to create interest and innovation in the establishment survey field by inspiring students and the faculty they work with. Participants were provided with two simulated datasets that were incomplete due to unit nonresponse and were challenged to complete both data sets using some form of imputation. Submissions were judged on a variety of factors, including theoretical soundness, originality and effectiveness of methods, and clarity of explanation. All of the contest submissions were excellent, which made it difficult to choose the winners. We thank all of the students who participated. Finally, we thank our panel of judges (listed below) for their careful review and insightful comments on each report and acknowledge their major contribution to ensuring the success and fairness of the contest.
Dr. David Haziza, Département de mathématiques et de statistique, Université de Montréal
Dr. Hang J. Kim, Department of Mathematical Science, University of Cincinnati
Dr. Pierre Lavallée, Statistics Canada
Dr. Michael Sinclair, Mathematica Policy Research
Ms. Katherine Jenny Thompson, U.S. Census Bureau
As the fifth in the series of international conferences on establishment surveys, ICES-V is designed to look at key issues and challenges pertaining to establishment surveys. For this conference, we are introducing two student contests: a data collection contest and an imputation contest. The winner of each contest will present his/her research at ICES-V. The conference registration fee is waived for the contest winners, and the conference will provide airfare (or other transportation costs), transport card, lodging, meals, and incidental expenses.
- You must be a current undergraduate or graduate student at any level or a recent graduate that has held a degree for fifteen months or less as of June 2016.
- You may perform your research independently or with a group of up to 5 students. However, travel expenses and registration will only be awarded to one author.
- Students are to carry out this assignment autonomously. Contributions from faculty advisors, if any, should be made clear in the report, as well as their expertise,
- The paper must be presented at the conference by one of the student participants.
- You must declare your intention to participate in one of the ICES-V student contests by completing the Student Contest Enrollment Form (Fillable) and submitting it to firstname.lastname@example.org. The deadline for enrollment is April 30, 2015. The ICES V Program Committee will not consider submissions from participants that have not enrolled by this deadline.
- You must submit your report by September 30, 2015. The report is limited to 6,000 words, excluding tables, figures, and references. Supporting materials (e.g., figures, tables, appendices) are limited to up to ten pages. The report must be submitted in English, the official conference language. The report (and the programming code, as applicable) must be sent to email@example.com with the subject line “Student Contest: Data Collection” or “Student Contest: Missing Data Treatment.”
The winners will be announced in February 2016. You may not submit the paper to any other 2016 student/young investigator award competition until this decision is made public. If you have questions concerning these contests, you can contact firstname.lastname@example.org. Use the subject line “Student Contest: Inquiry.”
We encourage the contest winners to produce a paper based on the report that could be submitted to an ICES-V special edition journal. It should be noted that submission does not guarantee publication, as the report will go through the normal review procedures.
Criteria for Judging Submissions
The reports will be reviewed by a panel of international experts in establishment survey designs chosen by the ICES-V Program Committee.
When it comes to the results of establishment surveys, data for economic statistics, and data for economic research purposes, it is important that data of high quality are collected in a cost-efficient way. For this contest, participants will develop an establishment web survey (business, farm, or other organization) that is designed to achieve high response rates and good data quality through communication strategies and questionnaire design, and carry out a pretest on their communication strategy and questionnaire design. Both the communication strategy and the questionnaire design affect the survey results. The survey communication strategy is aimed at delivering the questionnaire, as well as getting timely, accurate and complete data back. It includes motivating respondents and facilitating response, using contact materials like pre-notifications, letters, flyers, emails, telephone calls, reminders, etc. The questionnaire concerns the communication with respondents about the kind of data that are requested, and is aimed at getting relevant, valid and reliable data on the survey topic. It includes respondent understanding of the task and questions, and actual completion of the questionnaire, as well as usability issues. Pretesting a communication strategy and questionnaire design is an important step in the survey research process.
The topic of the survey questionnaire for this contest is: How do establishments decide whether to respond to a survey? This topic is relevant to the field of establishment survey methodology due to the difficulty survey organizations are experiencing in obtaining and maintaining high response rates. For a discussion on factors affecting survey participation see Snijkers et al. (2013): chapters 2 (the Business Context: pp. 39-826), 6 (Response Burden: pp. 219-252), and 9 (Business Survey Communication: pp. 402-439).
The focus of this contest is thus twofold: to develop a survey design directed towards getting qualitatively good data, and to carry out a pretest to investigate whether the communication and questionnaire design is effective. For the design of a questionnaire, a communication strategy, and pretesting, a number of methods and procedures can be used that are described in the literature:
- For a discussion on Quality Issues in Business Surveys, see Snijkers et al. (2013): chapter 3 (pp. 83-125). See also De Leeuw et al. (2008), and Groves et al. (2009).
- For an overview of Business Survey Development, Testing and Evaluation Methods, see Snijkers et al. (2013): chapter 7 (pp. 253-301). See also Willimack et al. (2004: chapter 19 in Presser et al., pp. 385-407), and Willis (2004).
- For designing a Business Survey Communication Strategy, see Snijkers et al. (2013): chapter 9 (pp. 359-430). See also chapter 12 in Dillman et al. (2009: pp. 402-439).
- For designing a Business Survey Questionnaires (and Questionnaire Communication) see Snijkers et al. (2013): chapter 8 (pp. 303-357). See also Couper (2008), and Dillman et al. (2014).
In order to develop a survey design, contest participants are additionally encouraged to utilize theories from a number of scientific disciplines. The disciplines and theories can include, but are not limited to: survey methodology, behavioural economics, marketing, social psychology, organisational psychology, decision making theories, (mass) communication sciences, theories on influence and resistance, and motivation theories. The objective of this contest is not merely to reproduce what is already known in the literature, but to get actionable conclusions that will inform methodological or outcome improvements more broadly.
Participants will submit a report describing the establishment survey communication and questionnaire design strategies, pretest procedures and findings, and recommendations for a final design. It will include, for example, the design background and theory (e.g., motivation for why the chosen strategies are thought to be effective and reasons behind decisions on choices and courses of action taken), and a description of how the pretest findings influenced the final communication strategy and questionnaire design.
Students may work individually, but we recommend working in small groups (of up to 5 students) to carry out this assignment. In practice, survey researchers often work in project groups to design and conduct a survey. Students are to carry out this assignment autonomously; if any, contributions from faculty advisors should be made clear in the report.
Send the report to email@example.com with the subject line “Student Contest: Data Collection” by September 30, 2015. You must declare your intention to participate in one of the ICES-V student contests by completing the Student Contest Enrollment Form and submitting it to firstname.lastname@example.org. The deadline for enrollment is April 30, 2015. The ICES V Program Committee will not consider submissions from participants that have not enrolled by this deadline.
Topic of the questionnaire: How do establishments decide whether to respond to a survey? The questionnaire should contain around ten questions on the topic.
Mode: The primary mode for the questionnaire is Web.
Tools: One of the following two free on-line services for the implementation of web questionnaires can be used:
Sample: A relatively small sample will be sufficient for pretesting (see literature on pretesting). A convenience sample from any general list is allowed. For example, it is allowed to use an online address list or visit establishments in order to develop your sample.
Your submission will be a written report that could serve as the basis for a scientific paper to be published in a journal. The report is limited to no more than 6000 words, excluding tables, figures, appendices, and references, and should contain, among others:
- A description of the selected communication strategy(ies) and questionnaire design, including rationale for the selected method(s) and the theoretical background concerning their effectiveness, and bibliography;
- A description of the pretest sample(s), method(s) and findings;
- A discussion on how the pretesting findings were used to modify contact procedures, questions, and questionnaire design;
- A description of recommended final contact procedures and questionnaire design.
- The Appendix should contain all contact materials, the survey questionnaire, and other relevant information;
- The Appendix should contain a list with the names, scientific disciplines and levels of all student participants (bachelor/undergraduate, master level/graduate, Ph.D./graduate), as well as – in the case of their involvement – the names, expertise, role, and contributions of faculty advisors.
Criteria for Judging Submissions
- Soundness of theoretical underpinnings of the survey design, good application of the design methods and pretesting procedures, and fulfillment of standards of scientific reporting;
- Clarity of presentation, especially motivation for chosen theoretical and practical approaches and their implementation for both communication strategies and questionnaire design;
- Originality, as well as effectiveness at achieving good survey results, practicability, and cost-effectiveness of the final communication strategy and questionnaire design;
- Quality of the pretesting, clarity of explanation behind pretest methods and how they were implemented, description of the pretest results and how results were integrated into final design/materials;
- Soundness and clarity of explanation on how the final questionnaire, strategies and procedures are theorized to influence response.
Note that these references offer guidance on the topic, but they are not meant to be a comprehensive list.
Establishment/Business Survey Methodology
Snijkers, G., Haraldsen, H., Jones, J., and Willimack, D. (2013). Designing and Conducting Business Surveys. Hoboken, N.J.: John Wiley and Sons. (This book provides an overview of the business survey process and the survey design, including the business context, response process, questionnaire design (for paper and web modes), pre-testing and evaluation methods, survey communication design, data collection, response burden, and factors affecting survey participation.)
Pre-Testing Establishment Survey Questionnaires
Willimack, D. K., Lyberg, L., Martin, J., Japec, L., and Whitridge, P. (2004). Evolution and adaptation of questionnaire development, evaluation, and testing methods for establishment surveys, in Presser, S., Rothgeb, J. M., Couper, M. P., Lessler, J. T., Martin, E., Martin, J., and Singer, E., eds., ethods for Testing and Evaluating Survey Questionnaires, Wiley, Hoboken, NJ, pp. 385–407.
Survey Methodology in General (Overview of Survey Design and Data Collection):
Couper, M. (2008). Designing Effective Web Surveys. New York: Cambridge University Press.
De Leeuw, E.D., Hox J.J. and D.A. Dillman (Eds.) (2008). The International Handbook of Survey Methodology. New York/London: Erlbaum/Taylor & Francis. (This book is focussed on social surveys.)
Dillman, D., Smyth, J.D., and L.M. Christian (2009). Internet, Mail, and Mixed-Mode Surveys: The Tailored Design Method. Third Edition. Hoboken, N.J.: John Wiley and Sons. (This book also contains a chapter on establishment surveys: chapter 12, pp. 402-439.)
Dillman, D., Smyth, J.D., and L.M. Christian (2014). Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method. Fourth Edition. Hoboken: Wiley.
Groves, R. M., Fowler, F. J., Jr., Couper, M. P., Lepkowski, J. M., Singer, E., and R. Tourangeau, (2009). Survey Methodology. Second Edition. Hoboken: Wiley. (This book is focussed on social surveys.)
Presser, S., Rothgeb, J, Couper, M., Lessler, J, Martin, E., Martin, J., and Singer, E. (Eds.) (2004). Methods for Testing and Evaluating Survey Questionnaires. Hoboken, N.J.: John Wiley and Sons.
Willis, G. (2004.) Cognitive Interviewing: A Tool for Improving Questionnaire Design. Thousand Oaks, CA: Sage.
Business Survey Response Process and Response Burden
Dale, T. and Haraldsen, G., eds. (2007). Handbook for Monitoring and Evaluating Business Survey Response Burdens. Luxembourg: Eurostat. Available at: http://ec.europa.eu/eurostat/documents/64157/4374310/12-handbook-for-monitoring-and-evaluating-business-survey-resonse-burden.pdf
Deliverables 2.1 (inventory of published research), 2.2 (Report on existing practices of NSIs concerning business burden reduction and motivation enhancement), 3.1 (Report on the business use of NSIs’ statistics based on external sources), and 3.2 (Final report integrating findings on business perspectives related to NSIs’ statistics) from the BLUE-ETS project (2010/2011), BLUE-Enterprise and Trade Statistic, available at www.blue-ets.istat.it/index.php?id=7 (Parts of this project studied response burden in depth.)
Establishment and Business Surveys in General
Proceedings of past International Conferences on Establishment Surveys: www.amstat.org/ASA/Meetings/ICES.aspx.
Journal of Official Statistics (JOS), December 2014, Special Issue on Establishment Surveys: www.degruyter.com/view/j/jos.2014.30.issue-4/issue-files/jos.2014.30.issue-4.xml
Cox, B.G., Binder, D.A., Chinnappa, B.N., Christianson, A., Colledge, M.J., and Kott, P.S. (Eds.) (1995). Business Survey Methods. New York: John Wiley & Sons.
This contest provides simulated data modeled from two industries in a monthly retail trade survey (Industry XXX1 and Industry XXX2). Each industry is a stratified simple random sample without replacement (SRS-WOR) sample with six strata: one certainty (take-all) and five noncertainty strata. Industry sample sizes were determined using Neyman Allocation, minimizing a fixed coefficient of variation (cv) of 0.01 on the unbiased estimates of the frame measure of size (MOS), assuming complete response. See Cochran (1977, Chapter 5, p.99).
The industry data sets that we have provided are incomplete due to item nonresponse. Your challenge is to complete both data sets, providing values for sales (sales00) and inventory (inventory00) using some form of imputation. Ideally, the completed data sets yield tabulated estimates of sales00 and inventory00 that satisfy all (or most) of the following criteria:
- The survey estimates obtained from the complete data set where no item values are missing are presented in the table below. Assume that you do not know these statistics, which are provided for validation purposes only.
- The selected imputation method allows for variance estimation (i.e., provide a variance estimate for each total that incorporates the variance due to imputation as well as the sampling error). Ideally, the fully imputed dataset will meet the required publication cv’s of 0.05 for sales and 0.10 for inventories. Coverage properties of interval estimates can be likewise considered and maintenance of multivariate properties (correlations between sales00 and inventories).
- As a methodologist, you choose the imputation method(s). The short (and incomplete) reference list provides a few options, including single imputation (deterministic and random), fractional imputation, multiple imputation, and balanced imputation. Composite imputation is acceptable.
Download the datasets. Missing values for sales00 and inventories00 are indicated by a blank value. Industry XXX1 has 579 records; Industry XXX2 has 505 records. See the “Data” section below for descriptive population statistics (the “truth”) and information on each dataset variable.
Complete datasets. Apply your proposed methodology to missing data.
All programming code used to perform imputation and to develop evaluation statistics must be provided along with the report. Send the report to email@example.com with the subject line “Student Contest: Imputation” by September 30, 2015. You must declare your intention to participate in one of the ICES-V student contests by completing the Student Contest Enrollment Form and submitting it to firstname.lastname@example.org. The deadline for enrollment is April 30, 2015. The ICES V Program Committee will not consider submissions from participants that have not enrolled by this deadline.
Population Statistics (“Truth”)
|Industry XXX1||Industry XXX2|
|Industry Code||XXX1 or XXX2|
|MOS||Frame measure of size|
|Strata||Sampling Strata indexed by 1 – 6 with Stratum 6 containing only certainty units|
|SamplingWeight Sampling (design) weight|
|Sales00||Current month sales for unit (may be missing)|
|Asales00||Current month administrative data value for sales|
|Sales01||Prior month sales for unit|
|Inventories00||Current month inventories for unit (may be missing)|
|Ainventories00||Current month administrative data value for inventories|
|Inventories01||Prior month inventories for unit|
Your submission will be a written report that could serve as the basis for a scientific paper to be published in a journal. The report is limited to no more than 6,000 words, excluding tables, figures, and references, and must:
- Describe your proposed imputation approach. Provide imputation models and assumptions.
- Describe the variance estimation procedure used, including information on how you estimate the nonresponse (imputation) variance component (if computed). A few references are provided as a starting point.
- Describe and demonstrate your model validation or sensitivity analysis procedures.
- Include the estimates and cv’s from the fully imputed dataset for sales00 and inventories00 from both industries.
- Contain a list with the names, scientific disciplines and levels of all student participants (bachelor/undergraduate, master level/graduate, Ph.D./graduate, as well as – in the case of their involvement – the names, expertise, role, and contributions of faculty advisors.
- Programming Code: Provide the programming code used to perform imputation and to compute estimates and variance estimates. SAS and R are acceptable programming languages. The code must be submitted separately from the report and does not count against the word count or page limit count for the report. Failure to provide the programming code will result in disqualification.
Criteria for Judging Submissions
- Well-founded theoretical basis for the proposed method(s), as well as compliance with general scientific standards
- Clarity of explanation of the proposed method(s) and of the implementation approach
- Originality, as well as effectiveness at achieving good survey results in terms of reproducing population totals (as closely as possible) and minimizing the level of variance due to imputation
- Ease of interpretation of the visual presentation of the results
- Method of validating proposed imputation model(s). Specifically address how the procedures achieve the desired imputation properties listed in the instructions (and other objectives of importance to the authors
- Ideas for future improvements or other applications
Note that these references offer some guidance in the topic, but this is not meant to be a comprehensive list.
Andridge, R.R. and Little, R.J.A. (2010). A Review of Hot Deck Imputation for Survey Non-response. International Statistical Review, 78(1), pp. 40-64.
Cochran, W.G. (1977). Sampling Techniques. New York: Wiley.
G. Chauvet, J.-C. Deville, and Haziza, D. (2011). On Balanced Random Imputation in Surveys. Biometrika, 98 (2), pp. 459-471.
Kalton, G., and Kasprzyk, D. (1986). The Treatment of Missing Survey Data. Survey Methodology, 12, pp. 1-16.
Kim, J.K. and Fuller, W. (2004). Fractional Hot Deck Imputation. Biometrika, 91, pp. 559-578.
Little, R.J.A. and Rubin, D.B. (2002). Statistical Analysis with Missing Data (2nd Ed). New York: Wiley.
Rao, J.N.K. (1996). On Variance Estimation with Imputed Survey Data. Journal of the American Statistical Association, 91(434), pp. 499-506.
Sande, I.G. (1982). Imputation in Surveys: Coping with Reality. The American Statistician, 36(3), pp. 145-152.
Särndal, C.-E., and Lundström, S. (2005). Estimation in Surveys with Nonresponse. New York: John Wiley & Sons, Inc.
Shao, J., and Steel, P. (1999). Variance Estimation for Survey Data with Composite Imputation and Nonnegligible Sampling Fractions. Journal of the American Statistical Association, 93, pp. 254-265.
Zhang, P. (2003). Multiple Imputation: Theory and Methods. International Statistical Review, 71(3), pp. 581-592.
For information on the procedures used to create of the synthetic industry populations of sales00, inventories00, sales01, and inventories01 prior to sample selection and induced unit nonresponse, see:
Mulry, M., Oliver, B., and Kaputa, S. (2014). Detecting and Treating Verified Influential Values in a Monthly Retail Trade Survey. JOS, 30, pp. 721-747.
For information on the procedures used to create of the Asales00 and Ainventories00, see:
Steel P. and R.E. Fay (1995), Variance Estimation for Finite Populations with Imputed Data, Proceedings of the Survey Research Section, American Statistical Association, Alexandria, VA, pp. 374-379.