Clustering with SAS Software — Professional Development Computer Technology Workshop
ASA, SAS
Clustering is the process of discovering groups within unlabeled data. For example, clustering based on limited and indirect information could identify individuals at high risk of a disease or potential customers for a new product. The investigator might also be interested in the strength of the distinction between clusters. Some clustering techniques are computationally expensive, and the investigator might need approximation techniques for larger data sets. There are many clustering methods that can serve the investigator’s goals while respecting practical and theoretical constraints. This workshop introduces several of these methods and approaches to clustering as they are implemented in SAS software, including k-means clustering, Gaussian mixture models, and hierarchical clustering. The workshop also illustrates techniques of estimation, model fitting, and scoring, and discusses the expectation-maximization, nearest-neighbors, and variational Bayes approaches. Finally, it demonstrates the advantages and limitations of these techniques in different applications. Attendees should have a basic familiarity with estimation. At the conclusion of the workshop, attendees will have a broad understanding of clustering techniques and will be able to use a variety of SAS procedures and products to apply these techniques.