Abstract:
|
Accurate risk modeling using electronic health records data is challenging partly because of variation in baseline risk and risk predictors across patient subgroups. Such heterogeneity in risk, if left unrecognized, can lead to unfair prediction with compromised accuracy. The data for subgroups may not be sufficiently rich to allow separate analyses for developing subgroup-specific models, particularly when the number of candidate predictors is large. To overcome this, we propose a novel algorithm to fit subgroup-specific models, which leverages the sharing of a common predictor among subgroups while performing variable selection for subgroup-specific predictors. Building upon an existing fusion technique, the proposed method encourages similarity among subgroup-specific parameters for the common predictor. We derive upper bound on the error measured in the l2-norm regarding local optima of the estimators. Results from extensive simulation studies show that our method greatly improves model calibration and accurately identifies subgroup-specific risk predictors. We apply the proposed method to the structured data extracted from the University of Pennsylvania Health System EHRs.
|