Abstract:
|
Due to the rapid growth in technology, various types of high-dimensional and high throughput data have been quickly generated in the area of genetic and epigenetic studies. There is a great need to develop mining methods with information from various aspects incorporated. We propose a novel clustering approach built upon genetic and epigenetic data including single nucleotide polymorphisms (SNPs), DNA methylation, and gene expression for a set of genes. The method is developed under the K-means framework. We formulate a novel Euclidean-distance-based metric to assess distances between clustering objects, and this metric takes into account complex joint effects of SNPs and DNA methylation on the expression of a gene. Simulations were conducted and demonstrated high sensitivity, specificity, and accuracy with respect to cluster assignment. We apply the method to a data set from a birth cohort on the Isle of Wight, UK, which includes SNPs, DNA methylation, and gene expressions, to identify clusters of children across eczema-related genes and examine genetic and epigenetic patterns of each cluster.
|