Modeling the volume-outcome relationship using high-dimensional patient-level covariate data across many hospitals
Theodore J. Iwashyna, University of Michigan Department of Internal Medicine 
*Edward H. Kennedy, VA Center for Clinical Management Research 
Anne E. Sales, VA Inpatient Evaluation Center (IPEC) 
Wyndy L. Wiitala, VA Center for Clinical Management Research 

Keywords: volume-outcome relationship, correlated data, risk-adjustment, sepsis, quality of care, mortality

Numerous studies have suggested that hospitals with higher patient volume yield better outcomes for a relatively wide array of medical conditions; however, the statistical methodology behind such studies is highly varied and can be fraught with complications (Shahian & Normand 2003). In this work we focus on three main issues in modeling the volume-outcome relationship: the choice of marginal versus conditional models, adjustment for confounding (i.e., risk or case-mix adjustment), and choices for modeling the functional relationship between volume and outcome. In particular, we consider these issues as they arise when high-dimensional patient-level data are available for a large number of hospitals. Using data from the Veterans Administration (VA) Inpatient Evaluation Center (IPEC), we present a variety of methods and approaches for exploring the volume-outcome relationship among patients with sepsis - a deadly, common, and costly medical condition. We examine and compare marginal (generalized estimating equation, or GEE) and conditional (mixed effects and fully Bayesian) modeling approaches; standard direct risk-adjustment, score-based risk-adjustment using the prognostic and propensity scores, and model selection; and modeling of the functional volume-outcome relationship using non-linear transformations, quantiles, splines, and other nonparametric techniques (e.g., generalized additive models). Although we give results and discussion in the context of modeling a volume-outcome relationship, our work applies more generally in any setting with high-dimensional correlated data.