Abstract:
|
Automatically assessment of word similarity has long been known as one of the major challenges in the development of Artificial Intelligence. Many methods taking either the knowledge-based approach or the corpus-based approach have been proposed. However, most words-similarity prediction methods only attempt to estimate the average scores of human raters. The distributional aspect of similarity scores for each words-pair is methodologically neglected, thus limiting their downstream applications in Natural Language Processing. Here, utilizing information from Categories (which intend to group together pages on similar subjects) of Wikipedia, a method to model similarity between pair of words as a probability distribution is presented. We showcase examples our method predict well, and explain the cases that why some comparisons our method didn't work well with our interactive webpage. We then utilize our method and Liquid Association to provide different insights of knowledge-based and corpus-based methods.
|