Abstract:
|
As molecular and genomic profiling of tumors has become increasingly common, the focus of cancer epidemiologic research has shifted away from the study of risk factors for disease as a single entity, and toward the identification of subtypes of disease. A number of statistical methods have been proposed for the study of risk factor differences across disease subtypes, a concept known as etiologic heterogeneity. While available statistical methods perform well when the number of characteristics that combine to form disease subtypes is not too large, there is a need for approaches that focus on dimension reduction in this context. One approach is to reduce up front the number of individual tumor characteristics available for study through variable selection whereas an alternative is to use clustering techniques to reduce dimension through identification of disease subtypes based on all available characteristics. We compare and contrast these approaches to dimension reduction, and seek to determine whether they can be profitably combined to identify the most etiologically distinct subtypes of disease.
|