Abstract:
|
Many existing semi-supervised techniques are effective learners only if strong smoothness assumptions hold. These assumptions typically involve a partitioning of the feature space into non-overlapping and possibly non-elliptical clusters. Unlike supervised learners, these semi-supervised techniques essentially predict with the straight arithmetic average or class majority by cluster. While this intuitive cluster assumption is often too strong of a requirement in practice, this does not imply that the unlabeled data have no value for training. It only implies that the unlabeled data used in this fashion degrades performance. Safe semi-supervised learning addresses this issue by adapting "purely semi-supervised" predictions like those described above toward a supervised alternative as needed. The proposed safe semi-supervised semi-parametric modeling (S4PM) approach compromises between a supervised and a purely semi-supervised learner, typically boosts performance over these pure alternatives, and is practical for application on real data.
|