Online Program Home
  My Program

All Times EDT

Abstract Details

Activity Number: 132 - SLDS CSpeed 1
Type: Contributed
Date/Time: Monday, August 9, 2021 : 1:30 PM to 3:20 PM
Sponsor: Section on Statistical Learning and Data Science
Abstract #318251
Title: Impact of Tweets Pre-Processing Techniques on a Dictionary for Environment
Author(s): Camilla Salvatore* and Daniele Toninelli and Michela Cameletti and Stephan Schlosser
Companies: University of Bergamo and University of Bergamo and University of Bergamo and Georg-August-Universität Göttingen
Keywords: Social media; Twitter; Text classification; Text mining
Abstract:

The availability of unstructured big data, such as the ones produced by social media, highlights the increasing methodological interest on text analysis and on the linked pre-processing phases. Several works have recently studied the impact of different pre-processing treatments on text classification. This aspect has been rarely studied when the target of the research is the definition of a topic-oriented dictionary that could be used to select messages regarding a certain topic among a wide group of unlabelled texts. The latter is a crucial phase: carefully filtering messages is a key aspect to start and to properly develop any type of textual analysis. In this paper, we aim at setting up a dictionary regarding environment. Starting from a verified list of Twitter Official Social Accounts, we evaluate if and how different pre-processing treatments (and their combination) can affect the final dictionary.


Authors who are presenting talks have a * after their name.

Back to the full JSM 2021 program