The first and seemingly simplest analytical step in data mining is to describe the data. But the standard exploratory data techniques of graphing and summarizing each variable take too long when dealing with hundreds of candidate predictors.
Moreover, data description alone cannot provide an action plan. You must build a predictive model based on patterns determined from known results, then test that model on results outside the original sample. In classical data analysis, the exploratory phase usually precedes the model selection phase. It's seen as a necessary preliminary for understanding the data before beginning to think about how to model it. But in data mining, sometimes we start with a preliminary model just to narrow down the set of potential predictors. This exploratory data modeling (EDM) seems to be at odds with standard statistical practice, but, in fact, it's simply using models as a new exploratory tool.
In this talk, we'll take a brief tour of the current state of data mining algorithms and using several case studies to explain how EDM can be used to narrow the search for a predictive model and to increase the chances of producing useful and meaningful results.
|