Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 398 - Beyond Traditional Approaches: Evolving Artificial Intelligence and Machine Learning to Advance Clinical Research and Drug Development
Type: Topic Contributed
Date/Time: Wednesday, August 5, 2020 : 1:00 PM to 2:50 PM
Sponsor: Biometrics Section
Abstract #313679
Title: “Data Nuggets” Tools for Analyzing Big Data
Author(s): Javier Cabrera*
Companies: Rutgers University
Keywords: Bigdata; nuggets

Big data has created new challenges for data analysis due to the large size of the datasets, with millions of observations and/or thousands of variables which are typical of many medical and business applications. An issue with standard clustering algorithms is that since they require pairwise distance calculation, they are limited by the number of observations to around 100K in a basic computer. A work-around is to conduct the analysis with a random sample of the dataset and a recent proposal is to replace the random sample with a set of likelihood-based methods such as “data squashing” or “principal points” or “support vectors”. The pitfall of these solutions is that the structure of the dataset, particularly at the tails or edges of the dataset, is not guaranteed to be captured very well. I will present a more geometric solution for analyzing large datasets through the concept of “data nuggets”. These data nuggets reduce a very large dataset into a small collection of nuggets of data, each containing a center, weight, and a scale parameter. Once the data is re-expressed as data nuggets, we may apply algorithms such clustering PCA linear models. Example of cell flow citometry

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program