The views expressed here are those of the individual authors and not necessarily those of the JSM sponsors, their officers, or their staff.
Online Program Home
Abstract Details
Activity Number:
|
338
|
Type:
|
Contributed
|
Date/Time:
|
Tuesday, July 31, 2012 : 10:30 AM to 12:20 PM
|
Sponsor:
|
Section on Statistical Learning and Data Mining
|
Abstract - #305924 |
Title:
|
Text Classification and Big Data
|
Author(s):
|
David Afshartous*+ and George Michailidis
|
Companies:
|
Vanderbilt University and University of Michigan
|
Address:
|
Department of Biostatistics, Nashville, TN, 37232-2158, United States
|
Keywords:
|
machine learning ;
distributed computing ;
text analytics ;
data mining ;
text categorization
|
Abstract:
|
The problem of text classification is central to many businesses in the information age where massive amounts of relevant data are readily available. Classic examples include e-mail spam, customer sentiment, diagnosis from electronic medical records (EMRs), and legal discovery document classification. Text classification may be viewed as a multi-step process that begins with transforming unstructured data into a structured format, e.g., a document-term matrix where rows represent documents and columns represent document features. Supervised learning methods may then be applied, where subsequent steps require many decisions and guidance for such decisions often differs between research domains. Such decisions include feature selection, dimensionality reduction, training set size, algorithm selection, and error analysis. In this paper, we consider text classification from the perspective of big data, where data size is affected by both the number of documents and the number of features employed by the learning algorithm. We assess the impact of big data on each step of text classification, and offer suggestions in the context of both open source and commercial software options.
|
The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.
Back to the full JSM 2012 program
|
2012 JSM Online Program Home
For information, contact jsm@amstat.org or phone (888) 231-3473.
If you have questions about the Continuing Education program, please contact the Education Department.