Abstract:
|
The purpose of this study was to investigate supervised machine learning models’ performance to determine the critical factors for textile recycling behavior (recycle textiles or do not recycle textiles). Secondary data from a survey given to 1,054 participants were analyzed. Six parameters were varied: feature scaling, cross-validation techniques, sampling techniques, number of folds, hyperparameters, and feature importance. Five algorithms were compared: decision tree, linear support vector classifier (linear SVC), K-nearest neighbor (KNN), gradient boosting decision trees (GBDT), and random forest trees. The hyperparameters used were the measure of impurity for decision tree and random forest, the number of nearest neighbors for KNN, and the learning rate for GBDT. The best performing model based on the F1 score was random forest on oversampled data. The feature importance resulted in zip code, gender, and ethnicity as the top 3 features. Zip code could be important because of high cardinality. When looking at permutation feature importance, the top three features were types of dwelling, gender, and ethnicity. Implications for textile and apparel survey researchers are given.
|