Abstract:
|
Many high-throughput genomic applications involve a large set of potential covariates and a response which is frequently measured on an ordinal scale, and it is crucial to identify which variables are truly associated with the response. Effectively controlling the false discovery rate (FDR) without sacrificing power is a major challenge in variable selection research. This study considers two variable selection procedures, model-X knockoffs (Candès 2018) and reference distribution variable selection (RDVS, Linkletter 2006), both of which utilize artificial variables as benchmarks to allow for Type-I error control. Model-X knockoffs constructs a “knockoff” variable for each covariate to mimic the covariance structure, while RDVS generates only one null variable and forms a reference distribution from multiple runs of model fitting. We propose novel statistics for ordinal responses to fit into the two procedures, and compare them in terms of observed FDR, power and computational efficiency, using simulated datasets. Moreover, real applications on multiple gene expression and methylation datasets are conducted where important genes related to certain ordinal outcomes are identified.
|