GENIA

Introduced by Jin-Dong Kim et al. in GENIA corpus - a semantically annotated corpus for bio-textmining

The GENIA corpus is the primary collection of biomedical literature compiled and annotated within the scope of the GENIA project. The corpus was created to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology.

The corpus contains 1,999 Medline abstracts, selected using a PubMed query for the three MeSH terms “human”, “blood cells”, and “transcription factors”. The corpus has been annotated with various levels of linguistic and semantic information.

The primary categories of annotation in the GENIA corpus and the corresponding subcorpora are:

Part-of-Speech annotation
Constituency (phrase structure) syntactic annotation
Term annotation
Event annotation
Relation annotation
Coreference annotation

Source: http://www.geniaproject.org/genia-corpus

Homepage

Benchmarks

Add a new result Link an existing benchmark

Task	Dataset Variant	Best Model
Nested Named Entity Recognition	GENIA	PIQN
Named Entity Recognition (NER)	GENIA	DeepStruct multi-task w/ finetune
Dependency Parsing	GENIA - LAS	BiLSTM-CRF
Dependency Parsing	GENIA - UAS	BiLSTM-CRF
Event Extraction	GENIA	DeepEventMine
Event Extraction	GENIA 2013	DeepEventMine