JSM 2011 Online Program

The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.

Abstract Details

Activity Number: 311
Type: Contributed
Date/Time: Tuesday, August 2, 2011 : 8:30 AM to 10:20 PM
Sponsor: Section on Statistical Learning and Data Mining
Abstract - #303334
Title: Authorship Discrimination and Topic Modeling: The Federalist Papers
Author(s): Mario Andres Morales*+
Companies: Hunter College and Polytechnic Institute of NYU
Address: Dept of Chemical and Biological Sciences Bioinformatics, Brooklyn, NY, 11201,
Keywords: Text Mining ; Topic Modelling ; Latent Dirichlet Allocation ; Federalist Papers
Abstract:

After forty seven years since the publication of the seminal work of Mosteller and Wallace about the use of Bayesian reasoning to assign the authorship to the disputed federalist papers, many other approaches have been used to replicate similar results based on the features described in this analysis. In this paper we reviewed the authorship problem, we cleaned the Federalist corpus with the use of desktop tools for natural language processing with python and the statistical programming language R and for the first time we estimated a topic model using the Latent Dirichlet Allocation model of Blei, et. al with the goal of differentiating authorship based on the estimated topics.


The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2011 program




2011 JSM Online Program Home

For information, contact jsm@amstat.org or phone (888) 231-3473.

If you have questions about the Continuing Education program, please contact the Education Department.