Online Program
Thursday, February 19 | |
PS1 Poster Session 1 & Opening Mixer |
Thu, Feb 19, 5:30 PM - 7:00 PM
Napoleon AB |
Simulating Confidential Epidemiological Data Sets (303043)*Ragheed Fadhil Al-Dulaimi, Hunter CollegeLevi Waldron, City University of New York Keywords: Simulation, R programming, secure data, epidemiological data Epidemiological data sets containing personally identifiable information often must be stored in secure, tightly controlled environments to protect subject confidentiality. These data sets may be complex in structure and may not be fully available until final collection and cleaning, delaying code development and data analysis. Furthermore, collaboration across multiple research centers may make development of a detailed data analysis plan difficult, especially when data access is limited to one site. We present an R package,“episim,” and generate simulations of such complex data sets while mimicking their summary statistics and idiosyncrasies. The package generates categorical variables with matching prevalences, continuous variables with matching quantiles, missing data, transformed variables such as discretized versions of continuous variables, and categorical variables with re-aggregated bins. Using a simple Excel spreadsheet as input, it facilitates simulation of a wide range of study designs and variable types by users with minimal programming skills.
|