Abstract:
|
The mixture of experts, or mixture of learners in general, is a popular and powerful machine learning model in which each expert learns to handle a different region of the covariate space. However, it is crucial to choose an appropriate number of experts to avoid overfitting or underfitting. We introduce a group fused lasso term to the model with the goal of making the coefficients of experts and gating networks closer together. By varying the strength of the penalization, we can avoid overspecialization of each expert and choose the optimal number of experts at the same time. An efficient algorithm to optimize the problem is developed using block-wise coordinate descent in the dual counterpart. Numerical results on simulated and real world datasets show that the penalized model outperforms the unpenalized one and performs on par with many well-known machine learning models.
|