Abstract:
|
Untargeted small molecule measurement has become widely available via analytical instrument techniques, such as gas or liquid chromatography coupled with mass spectrometry (LC- and GC-MS) and is often implemented in studies for various purposes like biomarker discovery. The utility of these techniques can be limited as studies quite commonly involve large numbers of samples, and their processing and instrumental analysis are spread in time. Inevitably, variations in sample handling, temperature fluctuation, and other factors result in systematic errors or biases of the measured abundances between the batches. Batch correction plays an indispensable role attempting to control and account for variations in signal that are inherent to small molecule profiling, however limited accepted guidelines on this topic exist. We propose an improved batch correction methodology for small molecule experiments using a combination of machine learning (ML) and chemical and structural properties of biomolecules. We compare our method to other commonly used models, such as COMBAT, and show that the combination of domain knowledge and ML provide improved correction and downstream statistical inference.
|