Abstract:
|
Noise accumulation can occur when there are many weak or unassociated predictors included in a model. Such noise can concentrate, obstructing true signal and the estimation of corresponding parameters. High dimensional data, settings in which the number of predictors is much larger than the sample size, are especially susceptible to noise accumulation. A common prediction problem in machine learning is classification, a type of supervised learning. In this presentation, we propose using Total Signal Index (TSI) to measure noise accumulation for classifications in high dimensional data. We present the theoretical computations of TSI for various scenarios with corresponding simulation results.
|