The DocRED Information Extraction (DocRED-IE) dataset extends the DocRED dataset for the Document-level Closed Information Extraction (DocIE) task. DocRED-IE is a multi-task dataset and allows for 5 subtasks: (i) Document-level Relation Extraction, (ii) Mention Detection, (iii) Entity Typing, (iv) Entity Disambiguation, (v) Coreference Resolution, as well as combinations thereof such as Named Entity Recognition (NER) or Entity Linking. The DocRED-IE dataset also allows for the end-to-end tasks of: (i) DocIE and (ii) Joint Entity and Relation Extraction. DocRED-IE comprises sentence-level and document-level facts, thereby describing short as well as long-range interactions within an entire document.
1 PAPER • 6 BENCHMARKS
CoreSearch is a dataset for Cross-Document Event Coreference Search. It consists of two separate passage collections: (1) a collection of passages containing manually annotated coreferring event mention, and (2) an annotated collection of destructor passages.
1 PAPER • NO BENCHMARKS YET
A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.
2 PAPERS • NO BENCHMARKS YET
XWINO is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities.
3 PAPERS • 1 BENCHMARK
OntoGUM is an OntoNotes-like coreference dataset converted from GUM, an English corpus covering 12 genres using deterministic rules.
7 PAPERS • 1 BENCHMARK
<Task description: joint learning of coreference resolution and query rewrite>
SciCo is an expert-annotated dataset for hierarchical CDCR (cross-document coreference resolution) for concepts in scientific papers, with the goal of jointly inferring coreference clusters and hierarchy between them.
Current approaches to context-aware MT rely on a set of surface heuristics to translate pronouns, which break down when translations require real reasoning. We create a new template test set ContraCAT to assess the ability of Machine Translation to handle the specific steps necessary for successful pronoun translation.
0 PAPER • NO BENCHMARKS YET
AMALGUM is a machine annotated multilayer corpus following the same design and annotation layers as GUM, but substantially larger (around 4M tokens). The goal of this corpus is to close the gap between high quality, richly annotated, but small datasets, and the larger but shallowly annotated corpora that are often scraped from the Web.
5 PAPERS • NO BENCHMARKS YET
Maybe Ambiguous Pronoun is a dataset similar to GAP dataset, but without binary gender constraints.
14 PAPERS • NO BENCHMARKS YET
BiPaR is a manually annotated bilingual parallel novel-style machine reading comprehension (MRC) dataset, developed to support monolingual, multilingual and cross-lingual reading comprehension on novels. The biggest difference between BiPaR and existing reading comprehension datasets is that each triple (Passage, Question, Answer) in BiPaR is written in parallel in two languages. BiPaR is diverse in prefixes of questions, answer types and relationships between questions and passages. Answering the questions requires reading comprehension skills of coreference resolution, multi-sentence reasoning, and understanding of implicit causality.
4 PAPERS • NO BENCHMARKS YET
GAP is a gender-balanced dataset containing 8,908 coreference-labeled pairs of (ambiguous pronoun, antecedent name), sampled from Wikipedia and released by Google AI Language for the evaluation of coreference resolution in practical applications.
92 PAPERS • 1 BENCHMARK
The ECB+ corpus is an extension to the EventCorefBank (ECB, Bejan and Harabagiu, 2010). A newly added corpus component consists of 502 documents that belong to the 43 topics of the ECB but that describe different seminal events than those already captured in the ECB. All corpus texts were found through Google Search and were annotated with mentions of events and their times, locations, human and non-human participants as well as with within- and cross-document event and entity coreference information. The 2012 version of annotation of the ECB corpus (Lee et al., 2012) was used as a starting point for re-annotation of the ECB according to the ECB+ annotation guideline.
72 PAPERS • 2 BENCHMARKS
OntoNotes 5.0 is a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).
237 PAPERS • 11 BENCHMARKS
The Winograd Schema Challenge was introduced both as an alternative to the Turing Test and as a test of a system’s ability to do commonsense reasoning. A Winograd schema is a pair of sentences differing in one or two words with a highly ambiguous pronoun, resolved differently in the two sentences, that appears to require commonsense knowledge to be resolved correctly. The examples were designed to be easily solvable by humans but difficult for machines, in principle requiring a deep understanding of the content of the text and the situation it describes.
313 PAPERS • 1 BENCHMARK
The 'Deutsche Welle corpus for Information Extraction' (DWIE) is a multi-task dataset that combines four main Information Extraction (IE) annotation sub-tasks: (i) Named Entity Recognition (NER), (ii) Coreference Resolution, (iii) Relation Extraction (RE), and (iv) Entity Linking. DWIE is conceived as an entity-centric dataset that describes interactions and properties of conceptual entities on the level of the complete document.
16 PAPERS • 4 BENCHMARKS
English Web Treebank is a dataset containing 254,830 word-level tokens and 16,624 sentence-level tokens of webtext in 1174 files annotated for sentence- and word-level tokenization, part-of-speech, and syntactic structure. The data is roughly evenly divided across five genres: weblogs, newsgroups, email, reviews, and question-answers. The files were manually annotated following the sentence-level tokenization guidelines for web text and the word-level tokenization guidelines developed for English treebanks in the DARPA GALE project. Only text from the subject line and message body of posts, articles, messages and question-answers were collected and annotated.
42 PAPERS • NO BENCHMARKS YET
GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:
8 PAPERS • 1 BENCHMARK
The Manually Annotated Sub-Corpus (MASC) consists of approximately 500,000 words of contemporary American English written and spoken data drawn from the Open American National Corpus (OANC).
MultiReQA is a cross-domain evaluation for retrieval question answering models. Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus. MultiReQA is a new multi-domain ReQA evaluation suite composed of eight retrieval QA tasks drawn from publicly available QA datasets from the MRQA shared task. MultiReQA contains the sentence boundary annotation from eight publicly available QA datasets including SearchQA, TriviaQA, HotpotQA, NaturalQuestions, SQuAD, BioASQ, RelationExtraction, and TextbookQA. Five of these datasets, including SearchQA, TriviaQA, HotpotQA, NaturalQuestions, SQuAD, contain both training and test data, and three, in cluding BioASQ, RelationExtraction, TextbookQA, contain only the test data.
ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual natural language processing (NLP) technologies face -- translation of coreference across languages. This corpus contains parallel texts for the language pair English-German, two major European languages. Despite being typologically very close, these languages still have systemic differences in the realisation of coreference, and thus pose problems for multilingual coreference resolution and machine translation. This parallel corpus covers the genres of planned speech (public lectures) and newswire. It is richly annotated for coreference in both languages, including annotation of both nominal coreference and reference to antecedents expressed as clauses, sentences and verb phrases.
11 PAPERS • NO BENCHMARKS YET
A large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.
10 PAPERS • NO BENCHMARKS YET
A large-scale English dataset for coreference resolution. The dataset is designed to embody the core challenges in coreference, such as entity representation, by alleviating the challenge of low overlap between training and test sets and enabling separated analysis of mention detection and mention clustering.
18 PAPERS • 1 BENCHMARK
WikiCoref is an English corpus annotated for anaphoric relations, where all documents are from the English version of Wikipedia.
27 PAPERS • 1 BENCHMARK