The SEER cancer database contains survival data for the U.S. population.The goal of this analysis is to discover spatial variation in breast cancer mortality rates in New Mexico while adjusting for known confounders such as race, age and tumor grade.
When analyzing large databases such as SEER using semiparametric Bayesian methods, Dirichlet process (DP) models exploits the richness of the dataset and provides information about breast cancer prognosis. But DP models can be prohibitively expensive for even a few hundred individuals.A cost effective MCMC strategy is applied to perform a fully Bayesian analysis of the SEER data.
The posterior distributions of several model parameters are highly non-normal. While a parametric model would make simplifying assumptions the semiparametric DP model flexibly adapts to arbitrary features of intersubject variation such as skewness and multimodality.There is strong evidence that after accounting for known indicators of disease prognosis, individual variability in breast cancer survival is non-normal and multimodal. This goes to show the value of DP mixture model and proposed fast MCMC algorithm.
|