Abstract:
|
Genomic studies often seek to explore the association between molecular markers and biological phenotype(s) to gain insight into the molecular basis of health and disease. However, patient-level heterogeneity often obfuscates the relationship between molecular markers and a phenotype of interest (POI) since the same phenotype can be product of completely different biological pathways. While Component-wise Sparse Mixture Regression (CSMR), a recently proposed regression-based clustering method, shows promises in detecting heterogeneous relationships between molecular markers and a POI, it sometimes yields inconsistent results when applied to high-dimensional data due to its inherent feature selection and regularization method. We explored different regularized regression methods within the CSMR framework to evaluate the internal consistency and accuracy of our proposed modifications using an extensive set of simulation studies. Across simulation scenarios where CSMR would yield inconsistent clusters, adaptive lasso improves cluster consistency and accuracy. Our modification of the CSMR method improves its ability to handle high-dimensional data, which are common in genomic studies.
|