Abstract:
|
Noise accumulation is a characteristic of high dimensional data. A consequence of noise accumulation is poor discrimination produced by conventional statistical techniques in classification problems. The primary objective of this study is to explore the discrimination ability of four classification methods - linear discriminant analysis, random forest, boosting, and support vector machine - in high dimensional sparse settings using simulations of varying signal strength. Data from two classes, N(µ₀, I₀) and N(µ₁, I₁), are considered for d = 2 to 5,000 predictors where µ₀ = 0 and µ₁ is sparse with the first j non-zero elements. Each class is generated with 100 observations and scenarios explored for j = 2, 6, and 10 non-zero elements of 1 and 3 in training and test datasets. Classifiers are generated for each method using training data then evaluated with test data. These simulations are repeated 100 times and the discriminative power of the four methods assessed by median classification error.
|
ASA Meetings Department
732 North Washington Street, Alexandria, VA 22314
(703) 684-1221 • meetings@amstat.org
Copyright © American Statistical Association.