Abstract:
|
The aim is to address the problem of discovering functional motifs, i.e. typical "shapes" that may recur several times in a set of (multidimensional) curves, capturing important local characteristics of these curves. We formulate probabilistic K-mean with local alignment, a novel algorithm that leverages ideas from Functional Data Analysis (joint clustering and alignment of curves), Bioinformatics (local alignment through the extension of high similarity "seeds") and fuzzy clustering (curves belonging to more than one cluster, if they contain more than one typical "shape"). Our algorithm identifies shared curve portions, which represent candidate functional motifs in a set of curves under consideration. It can employ various dissimilarity measures in order to capture different shape characteristics. After demonstrating the performance of the algorithm on simulated data, we apply it to discover functional motifs in "Omics" signals related to mutagenesis and genome dynamics, exploring high-resolution profiles of different mutation rates in regions of the human genome where these rates are globally elevated.
|