Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 288 - SLDS CSpeed 5
Type: Contributed
Date/Time: Wednesday, August 11, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319178
Title: Classification Accuracy Evaluation for Five Machine-Learning Classification Methods in Identifying Rare Cases in Education Assessment
Author(s): Chi Chang* and Harlan McCaffery
Companies: Michigan State University and University of Michigan
Keywords: K-Nearest Neighbors; Classification Methods; Assessment methods; Decision Tree; Random Forest; Support Vector Machines
Abstract:

This study aims to evaluate machine-learning methods for rare case classification. We used pass/fail cases in the USMLE Step-2 CK examination in one medical school for a demonstration. The data structure is small sample size and extremely small failed cases. The commonly used classification method, the logistic model, suffers from biased estimates and unreliable results when one of the categories has less than 20 cases. The example data set is size 360, and the dependent variable includes 14 failed (3.89%) and 346 pass cases (96.11%). The predictors include students’ standard scores in other exams. We evaluated five commonly used machine learning methods for classification: Logistic Regression, Decision Tree, Random Forest, K-Nearest Neighbors (kNN), and Support Vector Machines. Classification accuracy evaluation criteria, such as training errors, testing errors, and AUC, were used to examine the performance of the classification methods. The results showed that evaluation metrics considering both true-positive rate and false-positive rate should be chosen, and the nonparametric method, kNN, should be considered a better classification method for identifying rare case scenarios.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program