Abstract:
|
Genomic data variable selection problems center around the issue of large p and small n. Metabolomics shares the same data structure, but possesses extra challenges. The main challenges in metabolomics data processing are the uncertainty caused by peak identification, uncertainty in relating each peak to a specific compound of interest, lack of consensus on the best way to estimate quantities when the assay is untargeted (e.g. peak area vs peak height). Additionally, lack of independence caused by adducts and isomers and heterogeneity of the variances across metabolites caused by metabolites correlation structure and unequal sample weights. The performance of variable selection methods in these complex circumstances is explored. In this work we discuss those challenges in details and provide the examples of the results for the different approaches.
|