![IconGems-Print](images/IconGems-Print.png)
132 – 132 - SLDS CSpeed 1
Impact of Tweets Pre-Processing Techniques on a Dictionary for Environment
Camilla Salvatore
University of Bergamo
The availability of unstructured big data, such as the ones produced by social media, highlights the increasing methodological interest on text analysis and on the linked pre-processing phases. Several works have recently studied the impact of different pre-processing treatments on text classification. This aspect has been rarely studied when the target of the research is the definition of a topic-oriented dictionary that could be used to select messages regarding a certain topic among a wide group of unlabeled texts. The latter is a crucial phase: carefully filtering messages is a key aspect to start and to properly develop any type of textual analysis. In this paper, we aim at setting up a dictionary regarding environment. Starting from a verified list of Twitter Official Social Accounts, we evaluate if and how different pre-processing treatments (and their combination) can affect the final dictionary.