Creating a Real-Time, Reproducible Event Dataset

2 Dec 2016  ·  John Beieler ·

The generation of political event data has remained much the same since the mid-1990s, both in terms of data acquisition and the process of coding text into data. Since the 1990s, however, there have been significant improvements in open-source natural language processing software and in the availability of digitized news content. This paper presents a new, next-generation event dataset, named Phoenix, that builds from these and other advances. This dataset includes improvements in the underlying news collection process and event coding software, along with the creation of a general processing pipeline necessary to produce daily-updated data. This paper provides a face validity checks by briefly examining the data for the conflict in Syria, and a comparison between Phoenix and the Integrated Crisis Early Warning System data.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here