Abstract:
|
Administrative databases in healthcare provide sources of Big data but are rarely constructed for research purposes. Issues arising from these databases must be addressed to make informed clinical and policy decisions. Through a case study in chronic diseases, we show challenges typically associated with drawing inferences from administrative databases and propose solutions based on flexible data-driven statistical approaches. For example, one typical challenge is that the available information of each individual varies in both time and volume -- owing to uncontrolled settings in which data are collected -- which leads to issues such as complex censoring, missing data and violations of distributional assumptions of parametric models. We illustrate how we leverage machine learning methods and flexible semi-parametric models in our data-driven approaches to draw valid inference using information actually observed in administrative healthcare databases. Furthermore, through simulation studies, we highlight situations in clinical prediction studies where we can combine strengths of non-parametric algorithms and parametric models to build a data-driven and reproducible model.
|