Online Program Home
  My Program

Abstract Details

Activity Number: 352 - Contributed Poster Presentations: Korean International Statistical Society
Type: Contributed
Date/Time: Tuesday, August 1, 2017 : 10:30 AM to 12:20 PM
Sponsor: Korean International Statistical Society
Abstract #322861
Title: Big Data Noise Accumulation in Classification with Random Forest
Author(s): Dongseok Choi* and Miriam R Elman
Companies: Oregon Health & Science Univ and Oregon Health & Science University
Keywords: Noise accumulation ; Big Data ; High dimensional data ; Classification ; Random Forest ; Signal to noise ratio
Abstract:

Noise accumulation may occur when heterogeneous data and individual terms aggregate, increasing error from simultaneous estimation or testing of multiple parameters. Such error can concentrate, obfuscating the true value of model parameters. In conventional statistical settings where sample size exceeds the number of predictors, noise accumulation has less impact on estimation. High dimensional data - that is, situations where the number of predictors is much larger than the sample size - has been said to be especially susceptible to the effect of noise accumulation because of the large number of parameters. Not much has been done to investigate noise accumulation or characterize its properties. We assessed the impact of noise accumulation in high dimensional settings by evaluating the discriminative ability of random forest (RF) to classify two groups using simulated data. To evaluate the impact of different levels of noise, we explored scenarios with varying number of predictors and signal as well as explored the impact of increased sample size.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2017 program

 
 
Copyright © American Statistical Association