Abstract:
|
In this paper, Multiple Factor Analysis (MFA), an unsupervised multivariate approach is applied to analyze jointly the three QSTAR datasets: chemical structure, bioactivity and gene expression data. By interrelating chemistry, phenotype, and 'omics' data, functional manifestations (on-target and off-target effects) of drug actions on living cells can be explored for a set of candidate compounds in the early phase of drug discovery. MFA can be used to analyze similarities between datasets. It is characterized by two steps: (1) dataset normalization via weighting of the inverse of the first singular value; and (2) Principal Component Analysis (PCA) analysis on the concatenated normalized datasets. This allows for the identification of the main structures shared by these datasets. A structure is defined as a subset of genes, bioassays and chemical substructures which share similar patterns across a subset of compounds. Using the EGFR project as a case study, we were able to identify chemical substructures that are possibly linked to the on-target biological response in the cell represented by the bioactivity read-outs and the transcriptional effects of the compounds.
|