Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 356 - Statistical Learning: Methods and Applications
Type: Contributed
Date/Time: Wednesday, August 5, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #313379
Title: Genetic Algorithms for Feature Selection
Author(s): Huanjun Zhang* and Edward Jones
Companies: Texas A&M University and Texas A&M University
Keywords: Big Data; DEAP; Feature Selection; Genetic Algorithm; LASSO; Python

This paper proposes using genetic algorithms (GA) for feature selection. Although the focus is on using this approach in linear regression, it can be extended to other machine learning methods. The GA approach is tailored to regression models and then compared to traditional feature selection using stepwise and lasso. In this research, the emphasis is placed on finding the best feature subset among all possible combinations based upon the Bayesian Information Criterion, BIC. The approach is illustrated using a case study from fracking oil wells and simulations. The conclusion is that GA selection has great benefits for applying machine learning in applications with many nuisance features. GA selection is more likely to find the best model among all possible subsets. Constraints from model restrictions, data transformations, data encoding are naturally incorporated into the algorithm. Although the time needed to find the best solution is higher than shrinkage methods, in most cases it is acceptable when compared to the improved selection and confidence in the selected features.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program