Online Program Home
My Program

Abstract Details

Activity Number: 254 - Contributed Poster Presentations: Section on Statistical Learning and Data Science
Type: Contributed
Date/Time: Monday, July 30, 2018 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #329838
Title: An Application of Clustering Method on EHR Data Phenotyping and Prediction
Author(s): Shu Wang* and Joyce Chung-Chou H Chang and Christopher W. Seymour and Jason Kennedy and Zhongying Xu
Companies: University of Pittsburgh and University of Pittsburgh and University of Pittsburgh and University of Pittsburgh and University of Pittsburgh
Keywords: Clustering; EHR; Phenotyping

Clustering has been widely used in high dimensional data area. However, more and more interest lies in how clustering could help with pattern discovery in EHR data. The data set we worked with is SENECA (Sepsis ENdotyping in Emergency CAre), which contains sepsis encounters collected from 12 UPMC health systems from 2010-2012. Due to the nature of most clinical data, we didn't observe a natural clustering structure. So partitioning algorithm is more appropriate in our case. The algorithm we chose is consensus k-means which is a partition method that conducts number of clusters selection and clustering at the same time. After applying consensus k-means to SENECA, 4 clusters were identified and distinction of some clinical endpoints (in-hospital mortality, etc.) across 4 clusters was also seen. Furthermore, centers of these 4 clusters were used to predict cluster assignments for some external data sets: EHR data collected from 2013-2014 with all clinical variables accessed at hour 6, data set from ProCESS (Protocolized Care for Early Septic Shock) trial, etc. Multiple ways of cluster visualization were also explored.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2018 program