Abstract:
|
Model selection methods can be applied to find a small submodel containing the most relevant features in high-dimensional regression settings. However, we may not be able to determine whether the set of features selected might contain many false positives. For the (non-grouped) sparse setting, the knockoff filter creates "knockoff copies" of each variable to act as a control group, detecting whether the model selection is successfully controlling the false discovery rate (FDR). In this work, we propose the group knockoff filter, a method for false discovery rate control in a linear regression setting where the features are grouped. By considering the set of true and false discoveries at the group level, this method gains power relative to sparse regression methods. We apply our method to the multitask regression problem where multiple response variables share similar sparsity patterns across the set of possible features. The group knockoff filter successfully controls false discoveries at the group level in both settings, with substantially more discoveries made by leveraging the group structure, on both simulated and real data.
|