Abstract:
|
Strong correlations among features are well-known hurdles for existing selection/screening methods, but common across various domains. We explore several properties of a pre-processing step called ZCA whitening to transform features, which we and others have shown can greatly improve accuracy in certain selection procedures. However, this whitening method induces complete decorrelation at the cost of similarity with the original set of predictors and thus, interpretability. We propose a more general technique, ORTHOMAP, that allows one to directly control the level of collinearity permitted among features in order to strengthen the mapping between original and transformed variables. We show this approach can be formulated as a second order conic program (SOCP), and its connection with ZCA. We demonstrate the benefits and drawbacks of ORTHOMAP along with other decorrelation procedures through numerical experiments and a real data application concerning COVID-19 mortality curves in regions across Italy. These experiments also highlight an important aspect of ZCA and ORTHOMAP, the ability to be utilized across different modeling techniques and/or response structures.
|