JSM 2005 - Toronto

Abstract #303460

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.



The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page



Legend: = Applied Session, = Theme Session, = Presenter
Activity Number: 129
Type: Topic Contributed
Date/Time: Monday, August 8, 2005 : 10:30 AM to 12:20 PM
Sponsor: Section on Statisticians in Defense and National Security
Abstract - #303460
Title: Communication Graphs and Text Analysis of Email
Author(s): Elizabeth Leeds*+ and David Marchette
Companies: Naval Surface Warfare Center, Dahlgren Division and Naval Surface Warfare Center, Dahlgren Division
Address: 17320 Dahlgren Rd, Dahlgren, VA, 22448, United States
Keywords: text processing ; spectral graph analysis ; random forests ; Enron email dataset
Abstract:

The Enron email dataset is a collection of more than a half million email messages from more than 150 Enron employees from 1999 through 2002. Using this dataset, one could consider communication graphs constructed by placing an edge between employees when they exchange an email. Alternatively, one could consider a text intersection graph where an edge exists between employees when the content of what they email is judged to be similar. We compare the communication graphs to the text intersection graphs and claim the difference in the graphs illustrates the difference between who people know and what people know. Spectral graph techniques are used to explore the data. We utilize both intersection graphs and random forests to represent the data. The dissimilarity matrices that result from these representations capture two views of the relationships between the data. Studying these dissimilarity matrices gives insight into the structure of the data.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2005 program

JSM 2005 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2005