Online Program Home
My Program

Abstract Details

Activity Number: 126 - SPEED: New Methods in Statistical Genomics and Genetics Part 1
Type: Contributed
Date/Time: Monday, July 29, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistics in Genomics and Genetics
Abstract #307327 Presentation
Title: On Simulating Ultra High-Dimensional Multivariate Data
Author(s): Alfred Schissler*
Companies: University of Nevada, Reno
Keywords: high-dimensional data; multivariate; Monte Carlo; RNA-sequencing; negative binomial; covariance

In this era of big data, it is critical to realistically simulate data to conduct informative Monte Carlo studies. This is often problematic when data are inherently multivariate while at the same time are (ultra-) high dimensional. This situation appears frequently in observational data found on online and in high-throughput biomedical experiments (e.g., RNA-sequencing). Due to the difficulty in simulating realistic correlated data points, researchers often resort to simulation designs that posit independence --- greatly diminishing the insight into the empirical operating characteristics of any proposed methodology. A major challenge lies in the computational complexity involved in simulating these massive multivariate constructions. In this paper, we first review high-dimensional multivariate approaches and discuss relative merits of the approaches. Then we propose a fairly general procedure to simulate high-dimensional multivariate distributions with pre-specified marginal characteristics and a covariance matrix. Finally, we apply our method to simulate RNA-sequencing data sets (dimension > 20,000) with heterogeneous negative binomial marginals.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program