JSM 2005 - Toronto

Abstract #302599

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.



The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page



Legend: = Applied Session, = Theme Session, = Presenter
Activity Number: 298
Type: Invited
Date/Time: Tuesday, August 9, 2005 : 2:00 PM to 3:50 PM
Sponsor: IMS
Abstract - #302599
Title: Unlabeled Data in Statistical Language Processing
Author(s): David D. Lewis*+
Companies: David D. Lewis Consulting & Ornarose, Inc.
Address: 858 W. Armitage Ave., #296, Chicago, IL, 60614,
Keywords: unsupervised ; semisupervised ; bootstrapping ; document classification ; information retrieval ; computational linguistics
Abstract:

Statistical approaches, particularly supervised learning methods, are the dominant paradigm in computer processing of natural language text and speech. Manually annotating data with desired machine outputs, and tuning parameters to that data, often leads to cheaper and more effective systems than directly encoding linguistic intuitions. However, annotating data is a nontrivial cost, and there have been many attempts to leverage unlabeled or partially labeled data to reduce this expense. I will review some of the major approaches to using unlabeled data in language processing tasks, including pseudofeedback, transduction, colearning, and active learning. While sometimes providing considerable benefits, the effectiveness of these approaches varies unpredictably across tasks. Most of these techniques lack a formal statistical footing, so foundational work might lead to both better understanding and practical improvements.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2005 program

JSM 2005 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2005