Abstract:
|
Noise accumulation may occur when heterogeneous data and individual terms aggregate, increasing error from simultaneous estimation or testing of multiple parameters. Such error can concentrate, obfuscating the true value of model parameters. In conventional statistical settings where sample size exceeds the number of predictors, noise accumulation has less impact on estimation. High dimensional data - that is, situations where the number of predictors is much larger than the sample size - has been said to be especially susceptible to the effect of noise accumulation because of the large number of parameters. Not much has been done to investigate noise accumulation or characterize its properties. We assessed the impact of noise accumulation in high dimensional settings by evaluating the discriminative ability of random forest (RF) to classify two groups using simulated data. To evaluate the impact of different levels of noise, we explored scenarios with varying number of predictors and signal as well as explored the impact of increased sample size.
|