Abstract:
|
Epidemiologic research has traditionally been guided by the premise that certain diseases share an underlying etiology, or cause. However, with the rise of molecular and genomic profiling attention has increasingly focused on identifying subtypes of disease. As subtypes are identified, it is natural to ask the question of whether they arise from distinct sets of risk factors, a concept known as etiologic heterogeneity. In earlier work we developed a strategy for identifying disease subtypes that differ maximally with respect to the collective influences of known risk factors. This strategy involved the use of k-means clustering of the disease markers followed by calculation of a scalar measure of etiologic heterogeneity to identify the solution that possesses the greatest degree of heterogeneity. Individual risk factor effects can then be tested for differences across subtypes using polytomous logistic regression. The statistical properties of this method have been evaluated previously using simulation studies. Here we present results from an application of this method to a large breast cancer case-control study with available gene expression data for the cases.
|