Keywords: irected and undirected hypergraphs, natural language processing, sentimental analysis, regularization, nonconvex minimization
Numerical embedding is one dimension reduction tool mapping to a numerical vector, which becomes crucial to analyzing unstructured data that cannot be expressed in a predefined fashion. Then a downstream learning method following embedding, referred to as a two-stage method, is applicable to unstructured data, although embedding is unsupervised and constructed by transfer learning. In this article, we introduce a principle of embedding learning to integrate embedding into a learning process to achieve a higher learning accuracy. Particularly, we introduce a concept of sufficient embedding with respect to a specific learning task, on which we seek an optimal sufficient embedding to maximize the learning accuracy subject to an embedding loss constraint. Theoretically, we develop a learning theory to quantify the benefits of embedding learning to shed light on why it is expected to outperform the two-stage method. Moreover, when specializing the general framework to classification, we develop a graph embedding classifier for an input to be a hyperlink tensor representing multiple hypergraphs, directed or undirected, characterizing multiway relations of unstructured data units. Numerically, we implement linear graph embedding classification through blockwise coordinate descent and nonlinear graph embedding classification through a deep neural network, for identification of interactions in network analysis and for sentimental analysis. Finally, we demonstrate that the proposed classifier on two benchmarks examples on network analysis and movie review.