JSM Preliminary Online Program
This is the preliminary program for the 2009 Joint Statistical Meetings in Washington, DC.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2009 Program page




Activity Number: 229
Type: Contributed
Date/Time: Monday, August 3, 2009 : 2:00 PM to 3:50 PM
Sponsor: SSC
Abstract - #304478
Title: A Stylometric Analysis Method with Semiparametric Bayesian Approach
Author(s): Paramjit S. Gill*+
Companies: The University of British Columbia
Address: I K Barber School of Arts & Sciences, Okanagan, Kelowna, BC, V1V1V7, Canada
Keywords: Author attribution ; Dirichlet process ; Clustering ; Bayesian modeling
Abstract:

The statistical analysis of literary style uses a variety of methods based on a variety of quantitative measurements extracted from the textual material. We propose a semiparametric Bayesian approach to cluster the function word frequency data extracted from text documents, the objects that need to be clustered based on some similarity criterion. The methodology is based on using Dirichlet process priors. One important advantage of using this approach is that one need not pre-specify the number of clusters. A part of the model output is the posterior probability measuring the similarity of a pair of objects. This probability then can be used to infer the similarity of writing style used in writing the two documents and to resolve the issue of authorship. The method is tested on some well-known examples of authorship disputes.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2009 program


JSM 2009 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised September, 2008