Abstract:
|
Growing availability of healthcare databases presents opportunities to investigate how patients' responses to treatments vary across subgroups. Even with a large cohort size found in these databases, however, low incidence rates make it difficult to identify causes of treatment effect heterogeneity among a large number of clinical covariates. Sparse regression provides a potential solution. The Bayesian approach is particularly attractive in our setting, where signals are weak and heterogeneity across databases are substantial. Applications of Bayesian sparse regression to large-scale data, however, have been hampered by the lack of scalable computational techniques. We deploy advanced numerical linear algebra techniques to tackle the critical bottleneck in the posterior computation. We apply our algorithm to a clinically relevant large-scale observational study with n = 72,489 and p = 22,175, designed to assess the relative risk of intracranial hemorrhage from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up, opening up opportunities to carry out more ambitious hierarchical analysis of a network of healthcare databases.
|