Observational Comparison of Geo-tagged and Randomly-drawn Tweets
Twitter is a ubiquitous source of micro-blog social media data, providing the academic, industrial, and public sectors real-time access to actionable information. A particularly attractive property of some tweets is *geo-tagging*, where a user account has opted-in to attaching their current location to each message. Unfortunately (from a researcher{'}s perspective) only a fraction of Twitter accounts agree to this, and these accounts are likely to have systematic diffences with the general population. This work is an exploratory study of these differences across the full range of Twitter content, and complements previous studies that focus on the English-language subset. Additionally, we compare methods for querying users by self-identified properties, finding that the constrained semantics of the {``}description{''} field provides cleaner, higher-volume results than more complex regular expressions.
PDF Abstract