6 dataset results for Weakly-Supervised Named Entity Recognition

CoNLL 2003

CoNLL-2003 is a named entity recognition dataset released as a part of CoNLL-2003 shared task: language-independent named entity recognition. The data consists of eight files covering two languages: English and German. For each of the languages there is a training file, a development file, a test file and a large file with unannotated data.

639 PAPERS • 16 BENCHMARKS

OntoNotes 5.0

OntoNotes 5.0 is a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).

237 PAPERS • 11 BENCHMARKS

CoNLL

The CoNLL dataset is a widely used resource in the field of natural language processing (NLP). The term “CoNLL” stands for Conference on Natural Language Learning. It originates from a series of shared tasks organized at the Conferences of Natural Language Learning.

177 PAPERS • 49 BENCHMARKS

BC5CDR (BioCreative V CDR corpus)

BC5CDR corpus consists of 1500 PubMed articles with 4409 annotated chemicals, 5818 diseases and 3116 chemical-disease interactions.

174 PAPERS • 6 BENCHMARKS

CoNLL++

CoNLL++ is a corrected version of the CoNLL03 NER dataset where 5.38% of the test sentences have been fixed.

49 PAPERS • 3 BENCHMARKS

ShARe/CLEF 2014: Task 2 Disorders

3 PAPERS • 2 BENCHMARKS

Datasets

6 dataset results for Weakly-Supervised Named Entity Recognition