Abstract:
|
Graph-based based methods have been successful in solving many types of semi-supervised learning problems by optimizing a graph smoothness criterion. This criterion states that data instances nearby in a given graph are likely to have similar properties. A graph smoothness criterion cannot be directly incorporated into a generative unsupervised model because it is usually not clear what probabilistic process generated the data instances with respect to the graph, and incorporating the graph directly into a factorizable model (i.e. a time-series model such as an HMM) would break the model's factorizable structure, making exact inference methods (e.g. belief propagation) intractable. This method, called entropic graph-based posterior regularization (EGPR) provides a way to incorporate graph-based information into a probabilistic model by defining a regularization term on an auxiliary posterior distribution variable. We applied this approach to regulatory genomics data sets from the human genome, leading to the discovery of a new type of regulatory domain.
|