Abstract:
|
The increasing size and complexity of biomedical data could dramatically enhance basic scientific discovery and prediction for clinical applications. Realizing this potential requires novel statistical analysis algorithms that are both interpretable and predictive. We introduce the Union of Intersections (UoI) method, a flexible, modular, and scalable paradigm for regression and classification based on iterative resampling. UoI satisfies the bi-criteria of accurate recovery of a small number of interpretable features while maintaining high-quality prediction accuracy. We describe UoI and summarize new theoretical results on its mechanics. We evaluate UoI on synthetic and real biomedical data, demonstrating its superior performance. On real data, we demonstrate: extraction of interpretable functional networks for human brain, accurate prediction of phenotypes from genotype-phenotype data with reduced features, and improved prediction parsimony on several benchmark biomedical data sets for regression and classification. These results suggest that UoI could improve interpretation in data-driven discovery and prediction across scientific and medical fields.
|