JSM 2005 Online Program

JSM Activity #CE_21C

This is the preliminary program for the 2005 Joint Statistical Meetings in Minneapolis, Minnesota. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2005); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.

The Program has labeled the meeting rooms with "letters" preceding the name of the room, designating in which facility the room is located:

Minneapolis Convention Center = “MCC” Hilton Minneapolis Hotel = “H” Hyatt Regency Minneapolis = “HY”

Back to main JSM 2005 Program page

Legend:

= Applied Session,

= Theme Session,

= Presenter


		Add To My Program
CE_21C	Tue, 8/9/05, 8:00 AM - 12:00 PM	MCC-L100 I
Statistical Data Mining - Continuing Education - Course
ASA
Instructor(s): Edward J. Wegman, George Mason University
The focus in this short course is on statistical methods applied to data mining. Although there are probably as many definitions of data mining as there are people who claim to practice data mining, I prefer to think of data mining as an extension of exploratory data analysis and having basically the same goals, the discovery of unknown and unanticipated structure in the data. The chief distinction between the two topics resides in the size and dimensionality of the data sets involved. Data mining in general deals with much more massive data sets for which highly interactive analysis is not fully feasible. We begin the course with a discussion of computational complexity and issues of scalability of algorithms. We continue with a discussion of data preparation including compression using quantization and dimension reduction. With scalability in mind, we discuss the traditional data mining method of market basket analysis using association rules. We will discuss the probabilistic interpretation support and confidence of association rules. We will discuss statistical methods including density estimation, cluster analysis, and artificial neural networks. Subsequently we discuss visual data mining techniques with a number of examples. We then address some text mining problems, and finally conclude with a discussion of streaming data and some of the challenges and approaches for this class of data. The short course will be based on material found in the optional text. However, it is not necessary for participants to buy the text. Optional Text: Rao, C. R., Wegman, E. J., and Solka, J. L. (eds.) (2005) Handbook of Statistics: Data Mining and Data Visualization, Vol. 24, Elsevier/North Holland, Amsterdam