Abstract:
|
The concept of integrating data from disparate sources to accelerate scientific discovery has generated tremendous excitement in many fields. The potential benefits from data integration, however, may be compromised by the uncertainty due to imperfect record linkage and missing data. In a suicide risk study, survival data with uncertain event records arise from integrated real-world hospital discharge data. To address the problem, we develop an integrative Cox regression, in which the uncertainty in event times is modeled probabilistically. Numerical studies demonstrate that our method outperforms several competing approaches including multiple imputation. A marginal screening analysis using the proposed method is performed to identify diagnostic codes associated with death following suicide-related hospitalization in Connecticut. Extensions to the cure model setup and variable selection will be discussed. This study is a first step towards building a data-driven suicide prediction/prevention framework. We will discuss other aspects of our proposal, including data unification, data fusion, and join feature selection and predictive modeling.
|