Activity Number:
|
74
- Text Analysis in Machine Learning and Statistical Models
|
Type:
|
Contributed
|
Date/Time:
|
Monday, August 3, 2020 : 10:00 AM to 2:00 PM
|
Sponsor:
|
Section on Statistics in Defense and National Security
|
Abstract #313617
|
|
Title:
|
Uncovering Biases in Off-The-Shelf Natural Language Processing Tools
|
Author(s):
|
Elizabeth Cary* and Lee Burke and Madelyn Dunning and Jill Brandenberger and Michael Henry and Karl Pazdernik
|
Companies:
|
Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory and Pacific Northwest National Laboratory
|
Keywords:
|
Natural Language Processing;
Text Analysis;
Bias;
Arabic;
Sentiment Analysis;
Language Identification
|
Abstract:
|
The increased use of natural language processing (NLP) tools has placed a spotlight on potential biases present in these systems. Models trained and/or developed with particular language variations in mind may underserve populations with differing variations. Further, off-the-shelf (OTS) NLP tools are easily accessible and sometimes deployed without awareness of these biases. This research highlights the errors that can occur when building NLP predictions based on pretrained models. In particular, we consider Arabic NLP tools including OTS language identification systems and sentiment analysis systems trained using a variety of approaches and preprocessing methods. Results show the impact dialect can have on performance and why careful consideration of how a model has been trained is necessary.
|
Authors who are presenting talks have a * after their name.