Abstract:
|
As is typical in data mining, this paper studies problems where both a statistical and a computational issue are entangled. A problem that occurs in insurance and in finance is that of non-linear regression in the presence of asymmetric noise with heavy tails. Traditional robust regression methods obtain a reduced variance by downweighting outliers, which is fine if the noise is symmetric. However, in many important applications (e.g., claim amounts in insurance, asset returns in finance) the outliers are only on one side of the distribution, yielding heavily biased estimators. We study new approaches based on combinations of models which are individually biased but whose combination is unbiased. Because of unknown high-order dependencies, we apply these ideas using artificial neural networks as the building blocks. These methods have been applied to very large datasets (millions of examples), which raise other, more computational issues. Methods that require quadratic training time must be ruled out (e.g., SVMs). To address this issue, we present results on new divide-and-conquer learning algorithms which yield apparent linear training time.
|