JSM 2004 - Toronto

Abstract #300303

This is the preliminary program for the 2004 Joint Statistical Meetings in Toronto, Canada. Currently included in this program is the "technical" program, schedule of invited, topic contributed, regular contributed and poster sessions; Continuing Education courses (August 7-10, 2004); and Committee and Business Meetings. This on-line program will be updated frequently to reflect the most current revisions.

To View the Program:
You may choose to view all activities of the program or just parts of it at any one time. All activities are arranged by date and time.

The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2004 Program page



Activity Number: 64
Type: Invited
Date/Time: Monday, August 9, 2004 : 8:30 AM to 10:20 AM
Sponsor: Section on Statistical Graphics
Abstract - #300303
Title: Data Cleansing and Preparation at the Gates: A Data Streaming Perspective
Author(s): Don Faxon*+ and R. Duane King and John T. Rigsby and Steve Bernard
Companies: George Mason University and George Mason University and Naval Surface Warfare Center Dahlgren Division and George Mason University
Address: , Fairfax, VA, 22030-4444,
Keywords: data-cleaning ; streaming data ; massive datasets ; data preperation ; data collection
Abstract:

Collection of internet traffic data at the gates of a large enterprise necessarily involves data-cleaning, integration, selection, and transformation, especially if data-streaming strategies are employed. The huge quantities of packets that typically cross the enterprise gateway make multiple passes through the data cost-prohibitive. Data-cleansing, customarily perceived as the removal of noise and inconsistent data, is instead seen as a flagging and tagging procedure to facilitate detection of malformed or corrupted IP packets associated with malicious intrusion, or subtle reconnaissance activity as precursor to a massive attack on the enterprise computing infrastructure. Since real-time or near-real-time implementation of data analysis comprising such innovative concepts as data streaming or evolutionary graphics, fast in-line data cleansing and preparation is required. This paper discusses and illustrates the strategies we have incorporated into our data collection and analysis.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2004 program

JSM 2004 For information, contact jsm@amstat.org or phone (888) 231-3473. If you have questions about the Continuing Education program, please contact the Education Department.
Revised March 2004