Online Program Home
My Program

Abstract Details

Activity Number: 574 - Recent Advances in Software
Type: Contributed
Date/Time: Wednesday, July 31, 2019 : 2:00 PM to 3:50 PM
Sponsor: Section on Statistical Computing
Abstract #306492
Title: Language Modeling Using SAS
Author(s): JeeHyun Hwang* and Xu Yang and Haipeng Liu
Companies: SAS Institute Inc. and SAS Institute Inc. and SAS Institute Inc.
Keywords: N-gram Models; Long Short-Term Memory network; Recurrent Neural Network ; Deep Learning; Perplexity; Word Prediction

Language modeling is to predict next word in a given sentence. Language models are widely used for improving performance in many areas such as automatic speech recognition and machine translation. Prediction of next word is a sequential data prediction problem. In this paper, we present two approaches for the task of language modeling. First, n-gram models are designed to statistically estimate the probability of next word given a sequence of previous words. This one is supported by SAS language model functionality, called language model action set. This action set is designed to efficiently train n-gram models on cloud platforms when a training data set consists of a large number of documents. Second, we explore neural network for building language models using SAS deep learning functionality, called deep learning action set. This action set enables us to build LSTM-based models. We choose LSTM-based models because it is known that Long Short-Term Memory networks (LSTM) have advantages over recurrent neural networks in terms of handling exploding and vanishing gradient problems. We conduct user studies and our user studies demonstrate the effectiveness of our language models.

Authors who are presenting talks have a * after their name.

Back to the full JSM 2019 program