Abstract:
|
High-dimensional OMICS datasets usually contain large amounts of systematic variations resulting from various steps of experimental processing. Failure to properly account for these variations may result in misleading biological conclusions. Accordingly, normalization is a necessary preprocessing step to reduce unwanted variation and increase the accuracy of all downstream quantitative analyses. However, given many available normalization procedures addressing systematic noise in different approaches, it is unclear which method is a preferred choice. A comprehensive comparison of nine normalization methods currently available in our MVAPACK software package is conducted based on simulated and previously published NMR dataset modified with Gaussian noise and random dilution factors. A majority of the normalization methods performed equally well at a modest level of signal variance except for histogram matching. Probabilistic quotient and constant sum algorithms performed best at recovering true peak intensities and reproducing true classifying features using OPLS-DA model. Furthermore, our findings suggest that valid NMR dataset should have level of noise at around 20% or less.
|