JSM 2005 Online Program

Abstract #302931

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.

The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page

Legend:

= Applied Session,

= Theme Session,

= Presenter

Activity Number:	521
Type:	Contributed
Date/Time:	Thursday, August 11, 2005 : 10:30 AM to 12:20 PM
Sponsor:	Section on Bayesian Statistical Science
Abstract - #302931
Title:	Naive Bayes Classifier for Noisy Medical Information Dataset
Author(s):	Xiaowei Yang*+ and Yirong Yang
Companies:	BayesSoft, Inc. and BayesSoft, Inc.
Address:	3641 Midvale Ave, Los Angeles, CA, 90034, United States
Keywords:	naive Bayes classifier ; noisy data ; classification ; Bayesian network
Abstract:	Classification is one of the major tasks in knowledge discovery and data mining. Naive Bayes classifier, in spite of its simplicity, has proven surprisingly effective in many practical applications. In real datasets, noise is inevitable because of the imprecision of measurement or privacy-preserving concerns. In this paper, we develop a new approach for learning the underlying naive Bayes classifier from noisy observations. Our method, based on linear equation systems and statistical analysis mechanisms, reconstructs the underlying probability distributions of the noise-free dataset from the observed noisy data. By incorporating the noise model into the learning process, we improve the classification accuracy. Furthermore, as an estimate of the underlying naïve Bayes classifier for the noise-free dataset, the reconstructed model can be combined easily with new observations corrupted at different noise levels to obtain a good predictive accuracy. We apply our approach on both synthetic and real application dataset, especially on medical information dataset.

The address information is for the authors that have a + after their name.
Authors who are presenting talks have a * after their name.

Back to the full JSM 2005 program