All Times EDT
Keywords: imputation, neural networks, metabolomics
The analysis of metabolomics data often requires several steps of data preprocessing. Choosing a method to handle missing values is an important part of this data preparation process. There are currently numerous techniques available for imputing metabolomics data. These techniques range in complexity from simple mean imputation to more advanced machine learning methods such as random forests. This project proposes the use of feedforward neural networks to impute missing values in metabolomics studies. Networks are trained using observations with no missing values to predict the concentration of a single metabolite based on information from the other metabolites. The proposed method is evaluated using a metabolomics dataset from a diabetes study containing 130 metabolites measured for 198 patients. Simulation studies are conducted using this real data by randomly introducing missingness when the true values are known. Results show the neural network imputation approach performs favorably when compared with other state-of-the-art imputation procedures.