Multi-label (ML) classification extends binary and multi-class classification to scenarios where every data case is assigned several labels simultaneously. Applications include image annotation, music instrument recognition and text classification. Variable selection is an important part of ML data analysis, but it has received little attention in the literature. ML variable selection is more complex than for binary classification, mainly due to the presence of more than 1 response.
We propose an approach called L-CC. This method implements a compromise between simple classifier chains (CC) and the ensemble of classifier chains (ECC) procedures. The L-CC approach uses an ensemble of classifier chains with a semi-random chain structure and random forests (RF) as base learners. The specific structural assumptions of the L-CC method allow for variable selection based on the output from the RF. The results from L-CC include ML predictions and a matrix of variable importance values. We illustrate our proposal by applying it to simulated datasets and a direct marketing dataset obtained from a South African credit bureau.
|