Abstract:
|
The unprecedented amount of data at our fingertips offers a potential wealth of knowledge but also brings about concerns regarding ethical collection and usage. One such concern stems from the potential real-life consequences of injecting anomalous data into our models. We develop the Cauchy-Net Mixture Model (CNMM), which is a framework that allows for simultaneously clustering observations, making predictions, and identifying anomalies. The CNMM extends the flexibility of a Dirichlet Process Mixture Model (DPMM) by creating a mixture of a DPMM with an additional Cauchy distributed component, which we refer to as the Cauchy-Net (CN). The intuition is to leverage the heavy tails of the CN for capturing observations that do not fit into the well-defined clusters in order to remove their influence on cluster formation and prediction. We demonstrate the usefulness of the CNMM in a variety of experimental situations and apply the model for predicting housing prices in Fairfax County, Virginia.
|