Abstract:
|
As data sets of related studies become more easily accessible, combining data sets of similar studies is undertaken in practice to achieve a larger sample size and higher power. A major challenge arising from data integration pertains to data heterogeneity in terms of population, study coordination, or experimental protocols. Ignoring such heterogeneity in data analysis may result in biased estimation and misleading inference. Traditional techniques of remedy to data heterogeneity include the use of interactions and random effects, which are inferior to achieving desirable statistical power or providing an intuitive interpretation, especially when a large number of smaller data sets are combined. In this paper, we propose a regularized fusion method that allows us identify and merge inter-study homogeneous parameter clusters in regression analysis, without the use of hypothesis testing approach. Using fused lasso, we establish a computationally efficient procedure to deal with large-scale integrated data. Incorporating the estimated parameter ordering in the fused lasso facilitates computing speed with no loss of statistical power.
|