Online Program Home
My Program

Abstract Details

Activity Number: 305 - Bayesian Modeling and Variable Selection Methods
Type: Contributed
Date/Time: Tuesday, July 30, 2019 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Computing
Abstract #306412 Presentation
Title: Variable Selection Techniques for Model-Based Clustering of Directional Data
Author(s): Semhar Michael* and Damon Bayer
Companies: South Dakota State University and South Dakota State Univesity
Keywords: variable selection; finite mixture models; unit hypersphere; model-based clustering; von-Mises Fisher distribution; directional data

Directional data is when the direction of the vector has more relevant information than its magnitude. Text documents can be expressed as directional data by normalizing the frequency of words in each document and this will result in data on a unit hypersphere. Mixtures of von Mises-Fisher distributions have proven to be an effective model for clustering data on a unit hypersphere, but variable selection for these models remains an important and challenging problem. We derive two variants of the expectation-maximization framework, which are each used to identify a specific type of irrelevant variables for clustering. The first type is noise variables, which are not useful for separating any pairs of clusters. The second type is redundant variables, which may be useful for separating pairs of clusters, but do not enable any additional separation beyond the separability provided by some other variables. Removing these irrelevant variables is shown to improve cluster quality in simulated as well as benchmark datasets.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program