14 dataset results for Event Extraction

The GENIA corpus is the primary collection of biomedical literature compiled and annotated within the scope of the GENIA project. The corpus was created to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology.

116 PAPERS • 6 BENCHMARKS

ACE 2005 (ACE 2005 Multilingual Training Corpus)

ACE 2005 Multilingual Training Corpus contains the complete set of English, Arabic and Chinese training data for the 2005 Automatic Content Extraction (ACE) technology evaluation. The corpus consists of data of various types annotated for entities, relations and events by the Linguistic Data Consortium (LDC) with support from the ACE Program and additional assistance from LDC.

62 PAPERS • 9 BENCHMARKS

SciREX

SCIREX is a document level IE dataset that encompasses multiple IE tasks, including salient entity identification and document level N-ary relation identification from scientific articles. The dataset is annotated by integrating automatic and human annotations, leveraging existing scientific knowledge resources.

32 PAPERS • 2 BENCHMARKS

WikiEvents

WikiEvents is a document-level event extraction benchmark dataset which includes complete event and coreference annotation.

27 PAPERS • 2 BENCHMARKS

ChFinAnn

Ten years (2008-2018) ChFinAnn documents and human-summarized event knowledge bases to conduct the DS-based event labeling. Five event types included: Equity Freeze (EF), Equity Repurchase (ER), Equity Underweight (EU), Equity Overweight (EO) and Equity Pledge (EP), which belong to major events required to be disclosed by the regulator and may have a huge impact on the company value. To ensure the labeling quality, the authors set constraints for matched document-record pairs.

19 PAPERS • 1 BENCHMARK

M2E2

Aims to extract events and their arguments from multimedia documents. Develops the first benchmark and collect a dataset of 245 multimedia news articles with extensively annotated events and arguments.

14 PAPERS • NO BENCHMARKS YET

French Timebank

French TimeBank, a corpus for French annotated in ISO-TimeML.

6 PAPERS • 1 BENCHMARK

Phee

Phee is a dataset for pharmacovigilance comprising over 5000 annotated events from medical case reports and biomedical literature. It is designed for biomedical event extraction tasks.

3 PAPERS • NO BENCHMARKS YET

Title2Event

Title2Event is a large-scale sentence-level dataset for benchmarking Open Event Extraction without restricting event types. Title2Event contains more than 42,000 news titles in 34 topics collected from Chinese web pages.

3 PAPERS • NO BENCHMARKS YET

Catalan TimeBank 1.0

Catalan TimeBank 1.0 was developed by researchers at Barcelona Media and consists of Catalan texts in the AnCora corpus annotated with temporal and event information according to the TimeML specification language.

2 PAPERS • 1 BENCHMARK

EDT

The EDT dataset is designed for corporate event detection and text-based stock prediction (trading strategy) benchmark.

2 PAPERS • NO BENCHMARKS YET

EventNarrative

EventNarrative is a knowledge graph-to-text dataset from publicly available open-world knowledge graphs. EventNarrative consists of approximately 230,000 graphs and their corresponding natural language text.

2 PAPERS • 1 BENCHMARK

IndiaPoliceEvents

IndiaPoliceEvents is a corpus of 21,391 sentences from 1,257 English-language Times of India articles about events in the state of Gujarat during March 2002. This dataset is used for automated event extraction.

2 PAPERS • NO BENCHMARKS YET

Personal Events in Dialogue Corpus

The PEDC is a corpus of 14 episodes of This American Life podcast transcripts that have been annotated for events. The corpus contains excerpts from these episodes (listed in Tabe 1) that are dialogue. The granularity of annotation in this corpus is the token; each token is either annotated as an event, or a nonevent. For more information please download the corpus, and see the annotation guide for more specifics on how we define event, and the README for how the annotations are encoded. Also, much more information regarding the corpus, and its use is in the Automatic extraction of personal events from dialogue paper.

1 PAPER • NO BENCHMARKS YET

Datasets

14 dataset results for Event Extraction