Abstract:
|
Understanding the genetic underpinnings of disease is important for screening, treatment, drug development, and basic biological insight. Genome and epigenome-wide associations, wherein individual or sets of (epi) genetic markers are systematically scanned for association with disease are one window into disease processes. Naively, these associations can be found by use of a simple statistical test. However, a wide variety of confounders lie hidden in the data, leading to both spurious associations and missed associations if not properly addressed. These confounders include population structure, family relatedness, and cell type heterogeneity. I will discuss state-of-the art statistical approaches, based on linear mixed models, for conducting these analyses. In these approaches, confounding factors are automatically deduced, and then corrected for. Challenges include efficient computation and model optimization for increased power. Finally, I will discuss how insights from these areas can be leveraged to tackle the problem of uncovering latent sub-phenotypes-that is uncovering hidden case clusters for imprecisely defined phenotypes such as depression and type 2 diabetes.
|