Effects of Sampling on Twitter Trend Detection

LREC 2016 · Andrew Yates, Alek Kolcz, Nazli Goharian, Ophir Frieder ·

Much research has focused on detecting trends on Twitter, including health-related trends such as mentions of Influenza-like illnesses or their symptoms. The majority of this research has been conducted using Twitter{'}s public feed, which includes only about 1{\%} of all public tweets. It is unclear if, when, and how using Twitter{'}s 1{\%} feed has affected the evaluation of trend detection methods. In this work we use a larger feed to investigate the effects of sampling on Twitter trend detection. We focus on using health-related trends to estimate the prevalence of Influenza-like illnesses based on tweets. We use ground truth obtained from the CDC and Google Flu Trends to explore how the prevalence estimates degrade when moving from a 100{\%} to a 1{\%} sample. We find that using the 1{\%} sample is unlikely to substantially harm ILI estimates made at the national level, but can cause poor performance when estimates are made at the city level.

PDF Abstract LREC 2016 PDF LREC 2016 Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Add Remove

Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Effects of Sampling on Twitter Trend Detection

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove