Abstract:
|
For genetic and genomic data, it is important to select representative single nucleotide polymorphism (SNP) blocks in which neighboring SNPs are correlated to predict survival outcomes and to construct biological pathways. In this case, controlling the familywise error rate is too restrictive and hence we will focus on the false discovery rate (FDR) control. We propose a generative model, a variational autoencoder, to generate knockoffs for controlled group variable selection. We also evaluate the reproducibility of the feature selection algorithm by sub-sampling and compare it with other alternatives. Simulations are used to show that the proposed method has comparatively low group FDR and high power. Finally, we apply the method to the 1000 Genomes Project data to select SNP blocks for the prediction of human leukocyte antigen allele haplotypes.
|