Abstract:
|
Traffic crashes are usually classified into multiple classes by severity, where fatal crashes are usually significantly less than minor injury crashes. Traditional parametric and non-parametric methods do not work well in classifying the imbalanced data. AdaBoost can combines many weak classifiers to create a strong classifier, and thus it is effective in imbalanced data classification. The resampling methods, such as the Random Under-Sampling (RUS) and Synthetic Minority Over-sampling Technique (SMOTE), are also often used for classifying imbalanced data. The traffic crash injury data of Iowa in 2015 are used in this study. Five AdaBoost variants, i.e. AdaBoost.M1, AdaBoost.M2, SAMME, RUSBoost, and SMOTEBoost, where the last two approaches are the combinations of RUS and SMOTE with AdaBoost respectively, are compared. A multinomial logit (MNL) model is built as a base model. The results show that AdaBoost methods perform significantly better than the MNL model in classification. RUSBoost and SMOTEBoost perform better than other AdaBoost methods. The driver age, driver gender, AADT, lane width, and speed limit are found to be major factors influencing traffic injury severity.
|