![IconGems-Print](images/IconGems-Print.png)
A Statistical Analysis of a Time Series of Twitter Graphs
David Marchette
Naval Surface Warfare Center Dahlgren Division
In this paper I describe a set of Twitter data that we have been collecting for nearly two years. Using the Twitter streaming API, we collect all tweets geo-located within a set of rectangles covering the main land-masses of the world, as well as tweets containing certain key phrases. We collect "all" geo-located tweets, in the sense that Twitter provides all the tweets that are geo-located within the rectangle, provided the volume does not exceed a fixed limit. These tweets define a "mentions" digraph - each user id is a vertex and there is an edge from s to t if a tweet from s mentions t:@s:"@t u wanna go to lunch?". These mentions digraphs can be computed on time intervals to produce a time series of graphs. These graphs tend to have power law degree distributions, and I will describe the graphs and discuss some thoughts on how one might model these graphs. Using the graphs, I will discuss methods for inferring node attributes, such as the geo- position of a user whose tweet is not geo-located, or detecting spoofed geo-locations.