Twitter as a Potential Source for Official Statistics in The Netherlands
Piet J H Daas
Statistics Netherlands
Joyce Neroni
Utrecht University
Marko Roos
Statistics Netherlands
Mark van de Ven
Erasmus University
An increasing number of people is active in social media. Here, people voluntarily share information, discuss topics of interest, and contact family and friends. Since the response to the questionnaires of Statistics Netherlands continuous to decline we investigated the potential usability of the information exchanged in social media as a data source for official statistics. Because Twitter is used by a large number of people in the Netherlands and the pubic messages can be relatively easily collected, we started to investigate the content of Twitter-messages. We collected the messages in various ways, classified the topics discussed and looked at the usability of the information from an official statistics point of view. User oriented message collection was found the best approach for our purposes. Identification of the topics discussed in the 12 million messages collected was done in two stages. First the topics in all the hashtag containing messages were determined and messages were classified. Next, a random sample of the non-hashtag containing messages was classified. The results revealed that a considerable amount of the messages collected, around 50%, could be of interest.