Online Program Home
My Program

Abstract Details

Activity Number: 255 - Contributed Poster Presentations: Section for Statistical Programmers and Analysts
Type: Contributed
Date/Time: Monday, July 30, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section for Statistical Programmers and Analysts
Abstract #329393
Title: Using the SAS Hash Object for Sample Allocation Procedures with Large Data Sets/Big Data
Author(s): Julia Batishev* and Michael Yang
Companies: National Opinion Research Center (NORC) and NORC
Keywords: SAS hash object; Sample allocation; domain; Big Data; Macro SAS; sample size

Sample allocation procedures for complex sample designs are usually implemented in multiple steps. For example, we may need to allocate the sample by strata to meet precision targets by analysis domains, where the domains may be defined in a hierarchical manner. In addition, higher level domains may not be defined for all the data in the frame, or we may choose not to target them. We may want to repeat the allocation procedure many times to adjust the conditions and simulate the results based on different allocation schemes and sample size levels. To allocate the sample under many restrictions is a challenging task by itself. When we need to apply the allocation procedures to big data, such as U.S. household frames or Medicaid beneficiary files, we also face problems associated with elapsed time, CPU and Memory usage. In this paper, we compare the use of the Hash Object (SAS), the traditional SAS DATA step processing mode, and PROC SQL in SAS for complex sample allocation tasks and present the advantages and tradeoffs of using the Hash Object.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program