Abstract:
|
Clustering has been widely used in high dimensional data area. However, more and more interest lies in how clustering could help with pattern discovery in EHR data. The data set we worked with is SENECA (Sepsis ENdotyping in Emergency CAre), which contains sepsis encounters collected from 12 UPMC health systems from 2010-2012. Due to the nature of most clinical data, we didn't observe a natural clustering structure. So partitioning algorithm is more appropriate in our case. The algorithm we chose is consensus k-means which is a partition method that conducts number of clusters selection and clustering at the same time. After applying consensus k-means to SENECA, 4 clusters were identified and distinction of some clinical endpoints (in-hospital mortality, etc.) across 4 clusters was also seen. Furthermore, centers of these 4 clusters were used to predict cluster assignments for some external data sets: EHR data collected from 2013-2014 with all clinical variables accessed at hour 6, data set from ProCESS (Protocolized Care for Early Septic Shock) trial, etc. Multiple ways of cluster visualization were also explored.
|