Abstract:
|
We consider the problem of clustering data at multiple resolutions, when the different units are organized in a hierarchy that can be described by a tree: higher resolution entities are nested within lower resolution ones. A motivating example is the modeling of crime in urban environments at different spatial resolutions: US cities are divided into census tracts, which are divided into census block groups, which are further split into census blocks. We want to partition a city into regions with similar crime frequencies at each resolution while sharing information between partitions at different resolutions. The Dirichlet Process allows to partition data when the number of clusters is unknown. If we knew the partition at higher levels, such as the census tract level, Hierarchical Dirichlet Processes would be an appropriate model. Nested Dirichlet Processes instead allow to model partitions at multiple levels but would restrict block group clusters to be nested into census tract ones. In this work we combine Nested and Hierarchical Dirichlet Processes, to allow for more flexible partitions that do not have this constraint. We apply this method to crime frequencies in Philadelphia.
|