Conference Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 115 - Advances in Clustering and Classification
Type: Contributed
Date/Time: Monday, August 8, 2022 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Learning and Data Science
Abstract #323516
Title: Instance Selection with Threshold Clustering for Support Vector Machines
Author(s): Michael Higgins* and Tahany Basir
Companies: Kansas State University and Kansas State University
Keywords: Support Vector Machines; Instance Selection; Threshold Clustering; Computational Complexity; Big Data

Support vector machines (SVM) is a powerful supervised learning method for classification. However, training SVM may be computationally infeasible for large datasets. In this case, methods for instance selection (IS), in which a subset of representative units are selected for training, may be used. We propose the use of threshold clustering (TC), a recently-developed efficient clustering method, for IS when training SVM. Given a fixed size threshold t, TC forms clusters of t or more units while ensuring that the maximum within-cluster dissimilarity is small. Unlike most traditional clustering methods, TC is designed to form many small clusters of units, making it ideal for IS. Our proposed method begins by performing TC on each class in the training set. Then, the centroids of all clusters are formed creating a reduced training set. TC may be repeated if data reduction after this first step is insufficient. We show, via simulation and application to datasets, that TC efficiently reduces the size of training sets without sacrificing the prediction accuracy of SVM. Moreover, it often outperforms competing methods for IS both in terms of the runtime and prediction accuracy.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2022 program