Abstract:
|
Applying directed acyclic graph (DAG) models to proteogenomic data has been shown effective for detecting causal biomarkers of complex diseases. However, there remain unsolved challenges in DAG learning to jointly model binary clinical outcome variables and continuous biomarker measurements. In this paper, we propose a new tool, DAGBagM, to learn DAGs with both continuous and binary nodes. By using appropriate models, DAGBagM allows for either continuous or binary nodes to be parents or children nodes. It employs a bootstrap aggregating strategy to reduce false positives in edge inference. At the same time, the aggregation procedure provides a flexible framework to robustly incorporate prior information on edges. Through extensive simulation experiments, we demonstrate that DAGBagM has superior performance compared to several existing DAG structure learning algorithms. In addition, DAGBagM is computationally more efficient than the widely used bnlearn, as well as than mDAG. When applying DAGBagM to proteomic datasets from ovarian cancer studies, we identify potential protein biomarkers for platinum refractory/resistant response in ovarian cancer.
|