Abstract:
|
High missing rate is commonly observed in proteomics experiments. Imputation is generally performed before analyses take place. We evaluated the performance of different imputation methods in the context of data integration in this study. Motivated by studying associations between the phosphoproteomics quantification and activity based protein profiling (ABPP) for kinase expression, we simulated the phosphotyrosine (pY) and ABPP datasets with different combinations of missing patterns (missing at random, at low end, or mixture), sample size, the strength of correlation. We compared the imputation using minimum value, no imputation, mean, K-nearest neighbors, probabilistic PCA, and left censored accelerated failure time (LAFT) model. Spearman correlation and LAFT were used to evaluate pairwise association. LAFT tends to have high false positive rates. When the sample size is very low and missing proportion is high, no imputation has reasonable performance. Imputation using minimum value outperforms other methods when sample size is reasonable and missing is observed at low end.
|