The generation of political event data has remained much the same since the
mid-1990s, both in terms of data acquisition and the process of coding text
into data. Since the 1990s, however, there have been significant improvements
in open-source natural language processing software and in the availability of
digitized news content...
This paper presents a new, next-generation event
dataset, named Phoenix, that builds from these and other advances. This dataset
includes improvements in the underlying news collection process and event
coding software, along with the creation of a general processing pipeline
necessary to produce daily-updated data. This paper provides a face validity
checks by briefly examining the data for the conflict in Syria, and a
comparison between Phoenix and the Integrated Crisis Early Warning System data.