|
Activity Number:
|
549
|
|
Type:
|
Contributed
|
|
Date/Time:
|
Thursday, August 2, 2007 : 10:30 AM to 12:20 PM
|
|
Sponsor:
|
Section on Statistics in Epidemiology
|
| Abstract - #308447 |
|
Title:
|
Identifying Patterns of Longitudinal Data Set with Meaningful Inflated Missing Values: A Case Study Combining Data Mining and Statistical Techniques
|
|
Author(s):
|
Hua Fang*+ and Kimberly A. Espy and Maria L. Rizzo and Honggang Wang
|
|
Companies:
|
University of Nebraska-Lincoln and University of Nebraska-Lincoln and Bowling Green State University and University of Nebraska-Lincoln
|
|
Address:
|
539 N24th Street Apt 13, Lincoln, NE, 68503,
|
|
Keywords:
|
inflated missing data ; longitudinal study ; two-part mixture model ; clustering ; data mining ; growth pattern
|
|
Abstract:
|
Techniques for handling missing data exist separately in the fields of data mining and statistics. Methods for identifying patterns of inflated missing data in longitudinal studies are rare. In this research, an integrated approach is illustrated using a real observational data set where three types of missing data co-exist and account for a significant portion of the overall sample. Instead of using imputation methods, a two-part mixture model is introduced to model the inflated missing data and estimate the growth curves of each experimental subject over time. Based on individual growth parameter estimates and their auxiliary feature attributes, a clustering method is then integrated to identify the growth patterns. The combined approach exhibits the practical value of leveraging the statistical and data mining techniques in the current and future quantitative analyses.
|