Abstract:
|
Large numbers of features can be derived from radiomic analysis of imaging scans and used for prediction and diagnostic purposes. However, these features can be constructed using varying parameters, resulting in a complex feature selection problem with highly correlated and redundant structures. With the goal of an interpretable prediction model, we aimed to develop an approach to feature selection that would provide good predictive accuracy but accounts for the high correlations among features. The approach first used hierarchical clustering on each set of features corresponding to the same image extraction equation but with varying parameters. The unsupervised clustering returned prototypical features for each cluster, and then penalized regression using lasso selected the features for the logistic regression model. Predictive accuracy of our logistic regression model was measured by AUC, area under the ROC curve, using 5-fold cross validation. This feature selection method was applied to high dimensional features data from CT scans for the characterization of renal cancers and compared to alternative feature selection approaches such as random forests and neural networks.
|