Abstract:
|
This paper proposes using genetic algorithms (GA) for feature selection. Although the focus is on using this approach in linear regression, it can be extended to other machine learning methods. The GA approach is tailored to regression models and then compared to traditional feature selection using stepwise and lasso. In this research, the emphasis is placed on finding the best feature subset among all possible combinations based upon the Bayesian Information Criterion, BIC. The approach is illustrated using a case study from fracking oil wells and simulations. The conclusion is that GA selection has great benefits for applying machine learning in applications with many nuisance features. GA selection is more likely to find the best model among all possible subsets. Constraints from model restrictions, data transformations, data encoding are naturally incorporated into the algorithm. Although the time needed to find the best solution is higher than shrinkage methods, in most cases it is acceptable when compared to the improved selection and confidence in the selected features.
|