|
Activity Number:
|
229
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Monday, August 3, 2009 : 2:00 PM to 3:50 PM
|
|
Sponsor:
|
SSC
|
| Abstract - #304478 |
|
Title:
|
A Stylometric Analysis Method with Semiparametric Bayesian Approach
|
|
Author(s):
|
Paramjit S. Gill*+
|
|
Companies:
|
The University of British Columbia
|
|
Address:
|
I K Barber School of Arts & Sciences, Okanagan, Kelowna, BC, V1V1V7, Canada
|
|
Keywords:
|
Author attribution ; Dirichlet process ; Clustering ; Bayesian modeling
|
|
Abstract:
|
The statistical analysis of literary style uses a variety of methods based on a variety of quantitative measurements extracted from the textual material. We propose a semiparametric Bayesian approach to cluster the function word frequency data extracted from text documents, the objects that need to be clustered based on some similarity criterion. The methodology is based on using Dirichlet process priors. One important advantage of using this approach is that one need not pre-specify the number of clusters. A part of the model output is the posterior probability measuring the similarity of a pair of objects. This probability then can be used to infer the similarity of writing style used in writing the two documents and to resolve the issue of authorship. The method is tested on some well-known examples of authorship disputes.
|