Abstract #300462


The views expressed here are those of the individual authors
and not necessarily those of the ASA or its board, officers, or staff.


Back to main JSM 2002 Program page



JSM 2002 Abstract #300462
Activity Number: 138
Type: Topic Contributed
Date/Time: Monday, August 12, 2002 : 2:00 PM to 3:50 PM
Sponsor: Section on Government Statistics*
Abstract - #300462
Title: Click Stream Data Quality
Author(s): Stephen Eick*+
Affiliation(s): Visual Insights
Address: 215 Shuman Blvd -- Suite 200, Naperville, Illinois, 60563-8495,
Keywords: website ; clickstream ; browser behavior ; robots ; spiders
Abstract:

It is fairly easy to instrument a Web site to collect an extensive warehouses of click stream data. These datasets literally contain every visitor click, page view, cache hit, ad impression, referral, and may even contain transaction information. The data quality problem is that this rich and valuable data source is highly contaminated with uninteresting machine-generated traffic from robots, spiders, Web bots, and is cluttered with errors. On smaller sites the machine-generated traffic may be as 50% of the total. To exploit this data source for understanding visitor behavior, we must overcome three significant analysis problems. First, the huge volume of data easily overwhelms conventional analysis tools. Second, the data must be cleaned and transformed to avoid making improper inferences based on machine-generated traffic and log errors. And third, effective analysis that creates value involves correlating visitor behavior with other factors that can be used to influence it.


  • The address information is for the authors that have a + after their name.
  • Authors who are presenting talks have a * after their name.

Back to the full JSM 2002 program

JSM 2002

For information, contact meetings@amstat.org or phone (703) 684-1221.

If you have questions about the Continuing Education program, please contact the Education Department.

Revised March 2002