Online Program Home
My Program

Abstract Details

Activity Number: 432
Type: Contributed
Date/Time: Tuesday, August 2, 2016 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #319320
Title: L-CC Classification and Variable Selection for Multi-Label Data Sets
Author(s): Monika Stupalova du Toit* and Sarel Steel
Companies: Stellenbosch University and Stellenbosch University
Keywords: multi-label ; classification ; variable selection ; random forests

Multi-label (ML) classification extends binary and multi-class classification to scenarios where every data case is assigned several labels simultaneously. Applications include image annotation, music instrument recognition and text classification. Variable selection is an important part of ML data analysis, but it has received little attention in the literature. ML variable selection is more complex than for binary classification, mainly due to the presence of more than 1 response.

We propose an approach called L-CC. This method implements a compromise between simple classifier chains (CC) and the ensemble of classifier chains (ECC) procedures. The L-CC approach uses an ensemble of classifier chains with a semi-random chain structure and random forests (RF) as base learners. The specific structural assumptions of the L-CC method allow for variable selection based on the output from the RF. The results from L-CC include ML predictions and a matrix of variable importance values. We illustrate our proposal by applying it to simulated datasets and a direct marketing dataset obtained from a South African credit bureau.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2016 program

Copyright © American Statistical Association