The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Abstract Details
Activity Number:
|
311
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, August 2, 2011 : 8:30 AM to 10:20 PM
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
Abstract - #303334 |
Title:
|
Authorship Discrimination and Topic Modeling: The Federalist Papers
|
Author(s):
|
Mario Andres Morales*+
|
Companies:
|
Hunter College and Polytechnic Institute of NYU
|
Address:
|
Dept of Chemical and Biological Sciences Bioinformatics, Brooklyn, NY, 11201,
|
Keywords:
|
Text Mining ;
Topic Modelling ;
Latent Dirichlet Allocation ;
Federalist Papers
|
Abstract:
|
After forty seven years since the publication of the seminal work of Mosteller and Wallace about the use of Bayesian reasoning to assign the authorship to the disputed federalist papers, many other approaches have been used to replicate similar results based on the features described in this analysis. In this paper we reviewed the authorship problem, we cleaned the Federalist corpus with the use of desktop tools for natural language processing with python and the statistical programming language R and for the first time we estimated a topic model using the Latent Dirichlet Allocation model of Blei, et. al with the goal of differentiating authorship based on the estimated topics.
|
The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2011 program
|
2011 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.