Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 74 - Text Analysis in Machine Learning and Statistical Models
Type: Contributed
Date/Time: Monday, August 3, 2020 : 10:00 AM to 2:00 PM
Sponsor: Section on Statistics in Defense and National Security
Abstract #313617
Title: Uncovering Biases in Off-The-Shelf Natural Language Processing Tools
Author(s): Elizabeth Cary* and Lee Burke and Madelyn Dunning and Jill Brandenberger and Michael Henry and Karl Pazdernik
Companies: Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory
Keywords: Natural Language Processing; Text Analysis; Bias; Arabic; Sentiment Analysis; Language Identification
Abstract:

The increased use of natural language processing (NLP) tools has placed a spotlight on potential biases present in these systems. Models trained and/or developed with particular language variations in mind may underserve populations with differing variations. Further, off-the-shelf (OTS) NLP tools are easily accessible and sometimes deployed without awareness of these biases. This research highlights the errors that can occur when building NLP predictions based on pretrained models. In particular, we consider Arabic NLP tools including OTS language identification systems and sentiment analysis systems trained using a variety of approaches and preprocessing methods. Results show the impact dialect can have on performance and why careful consideration of how a model has been trained is necessary.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2020 program