Abstract:
|
We consider the estimation of densities in multiple subpopulations, where the available sample size in each subpopulation varies greatly. For example, in the context of epidemiology, different diseases could share similar generating mechanism but contrast in their prevalence. A fully data-driven approach is proposed to estimate the density of a quantity of interest in each subpopulation without the need of specifying the parametric form of the density families. The idea is to map the density functions into a Hilbert space and then apply functional data analytic methods so as to derive low-dimensional approximates. Subpopulation densities are then fitted within the low-dimensional families using likelihood-based methods, where information borrowing is enforced through shrinkage. Further, the approximation via exponential families is computationally efficient. The proposed methods are illustrated through simulations and applications to electronic medical record (MIMIC) and climate data, showcasing interpretable estimates and favorable performance.
|