Abstract:
|
Automatic summarization by key words and phases creates a summary of document using keywords or phases, which retains the essential part of the original document. A traditional method extracts a summary from a single document by examining relative importance of its words. In this presentation, we will present a method to solve this problem through learning the complex process from a variety of documents to provide more efficient summarization. In particular, we introduce a loss to measure the discrepancy between predicted and actual tag sets, which is expressed in terms of a weighted sum of pairwise margins between two tags, weighted by the degrees of similarity between them. On this ground, we construct a regularized empirical loss to incorporate certain linguistic knowledge, and identify a tagger maximizing the separations between the pairwise margins. As a result, the proposed method is capable of detecting novel tags absent from a training sample by exploring similarity among existing tags. Computational and theoretical aspects of the proposed method will be further discussed.
|