Abstract:
|
In this the era of big(ger) data, the all-time relevant question of how to detect the features which are truly relevant for an outcome of interest becomes paramount. As the amount of variables in data increases, the degree to which field knowledge is incorporated in the analysis decreases. As more and more automatized machine learning methods for handling and extracting information from data become easily accessible, an overview of the qualities and potential pitfalls of the contemporary and excessively used methods is pertinent. Here we present the results of a simulation study assessing the performance of different methods for 'blindfolded' or 'field knowledge free' feature selection. We have considered Lasso, Forward selection, Elastic Net, Simplified Relaxed Lasso and two ad hoc methods. The question of good performance and how to assess it is discussed and the methods are compared on a variety of different assessment measures both new and existing.
|