Abstract:
|
Clustering methods are an essential part of exploratory analysis of genomic high-throughput sequencing data. K-means or hierarchical clustering are frequently applied to summary statistics computed from a predefined 'window' (e.g. a gene or read count peak) with a dissimilarity measure such as Euclidean distance. While traditional clustering methods can use higher-resolution information within these windows such as read counts per base pair, they are not able to model the spatial structure of the data. We consider various functional clustering methods that account for spatial structure by employing Bayesian wavelet-based modeling. Wavelet-based approaches have been shown to effectively model high-throughput sequence data using a sparse representation. We investigate their performance in multiple applications of ATAC-seq, which measures genome-wide chromatin accessibility. One example includes clustering diverse patterns of (co-)transcription factor binding. We explore improving functional clustering methods in the genomics context by including gene annotation and sequence information.
|