Abstract:
|
High throughput technology makes it possible to monitor metabolites on different experiments and has been widely used to detect differences in metabolites in many areas of biomedical research. Mass spectrometry has become one of the main analytical techniques for profiling a wide array of compounds in the biological samples. Missing values in metabolomics dataset occur widely and can arise from different sources, including both technical and biological reasons. Mostly the missing value is substituted by the minimum value, and this substitute may lead to different results in the downstream analyses. In this study we propose a modified version of the K-nearest neighbor (KNN) approach which accounts for the truncation at the minimum value called KNN truncation (KNN-TN). We compare the imputation results based on KNN-TN with other KNN approaches such as KNN based on correlation (KNN-CR) and KNN based on Euclidean distance (KNN-EU). The proposed approach assumes that the data follows a truncated normal distribution with the truncation point at the detection limit (LOD). The results of KNN-TN, KNN-CR and KNN-EU were analyzed by the root mean square error (RMSE) measure.
|