Abstract:
|
We combine two important extensions of ordinary least squares regression: regularization and optimal scaling. The latter uses splines and step functions variables in the same prediction framework to transform continuous predictors and quantify categorical, respectively. Both splines and step functions can be restricted to be monotonic, preserving the ordinal information in the data. In addition, they they can be combined with regularization methods such as the Lasso and the Elastic Net. Predictor variables in high-dimensional data, for example in metabolomics, are usually highly correlated. We will show how optimal scaling can reduce a predictor's own predictability from the other predictors, increasing its conditional independence, and the condition of the correlation matrix as a whole as measured by Log Determinant Divergence. We will discuss the interaction between regularization and optimal scaling, and finally, other options for regularization of regression coefficients and category quantifications/spline coefficients will be proposed. Applications will be presented in the context of metabolomics. This is joint work with Anita Van der Kooij and Thomas Hankemeijer.
|