14 dataset results for Semantic Role Labeling

FrameNet is a linguistic knowledge graph containing information about lexical and predicate argument semantics of the English language. FrameNet contains two distinct entity classes: frames and lexical units, where a frame is a meaning and a lexical unit is a single meaning for a word.

435 PAPERS • NO BENCHMARKS YET

OntoNotes 5.0

OntoNotes 5.0 is a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).

237 PAPERS • 11 BENCHMARKS

CoNLL

The CoNLL dataset is a widely used resource in the field of natural language processing (NLP). The term “CoNLL” stands for Conference on Natural Language Learning. It originates from a series of shared tasks organized at the Conferences of Natural Language Learning.

177 PAPERS • 49 BENCHMARKS

CoNLL-2012

The CoNLL-2012 shared task involved predicting coreference in English, Chinese, and Arabic, using the final version, v5.0, of the OntoNotes corpus. It was a follow-on to the English-only task organized in 2011.

87 PAPERS • 4 BENCHMARKS

NomBank

NomBank is an annotation project at New York University that is related to the PropBank project at the University of Colorado. The goal is to mark the sets of arguments that cooccur with nouns in the PropBank Corpus (the Wall Street Journal Corpus of the Penn Treebank), just as PropBank records such information for verbs. As a side effect of the annotation process, the authors are producing a number of other resources including various dictionaries, as well as PropBank style lexical entries called frame files. These resources help the user label the various arguments and adjuncts of the head nouns with roles (sets of argument labels for each sense of each noun). NYU and U of Colorado are making a coordinated effort to insure that, when possible, role definitions are consistent across parts of speech. For example, PropBank's frame file for the verb "decide" was used in the annotation of the noun "decision".

49 PAPERS • NO BENCHMARKS YET

QA-SRL

QA-SRL was proposed as an open schema for semantic roles, in which the relation between an argument and a predicate is expressed as a natural-language question containing the predicate (“Where was someone educated?”) whose answer is the argument (“Princeton”). The authors collected about 19,000 question-answer pairs from 3,200 sentences.

41 PAPERS • NO BENCHMARKS YET

CoNLL-2009

The task builds on the CoNLL-2008 task and extends it to multiple languages. The core of the task is to predict syntactic and semantic dependencies and their labeling. Data is provided for both statistical training and evaluation, which extract these labeled dependencies from manually annotated treebanks such as the Penn Treebank for English, the Prague Dependency Treebank for Czech and similar treebanks for Catalan, Chinese, German, Japanese and Spanish languages, enriched with semantic relations (such as those captured in the Prop/Nombank and similar resources). Great effort has been devoted to provide the participants with a common and relatively simple data representation for all the languages, similar to the last year's English data.

8 PAPERS • 3 BENCHMARKS

QA-SRL Bank 2.0

QA-SRL Bank 2.0 is a large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations. The corpus consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that was shown to have high precision and good recall at modest cost.

6 PAPERS • NO BENCHMARKS YET

X-SRL

SRL is the task of extracting semantic predicate-argument structures from sentences. X-SRL is a multilingual parallel Semantic Role Labelling (SRL) corpus for English (EN), German (DE), French (FR) and Spanish (ES) that is based on English gold annotations and shares the same labelling scheme across languages.

4 PAPERS • NO BENCHMARKS YET

ExHVV

ExHVV is a novel dataset that offers natural language explanations of connotative roles for three types of entities -- heroes, villains, and victims, encompassing 4,680 entities present in 3K memes.

1 PAPER • NO BENCHMARKS YET

GePaDe

This dataset encompasses 265 speeches (over 200,000 tokens) from the German Bundestag, primarily from the 19th legislative term (2017-2021), given by 195 distinct speakers representing 6 political parties.

1 PAPER • 2 BENCHMARKS

Product Reviews 2017

The corpus contains review sentences mostly of products in electronics domain, annotated and segregated into 4 comparison categories. Each comparison sentence is annotated with names of the products (PROD1 and PROD2), the aspect (ASP) and the predicate (PRED). Dataset contains sentences after auto-labeling on SNAP dataset and manually labeled sentences from the following corpora:

1 PAPER • 1 BENCHMARK

PropBank-PT

The PropBankPT (Branco et al., 2012) is a set of sentences annotated with their constituency structure and semantic role tags, composed of 3,406 sentences and 44,598 tokens taken from the Wall Street Journal translated. For the creation of this PropBank we adopted a semi-automatic analysis with a double-blind annotation followed by adjudication. The resulting dataset contains three information levels: phrase constituency, grammatical functions, and phrase semantic roles. The main motivation behind the creation of this resource was to build a high quality data set with semantic information that could support the development of automatic semantic role labelers for Portuguese. The development of this resource started under the METANET4U project (at: http://metanet4u.eu/) whose main goal is to contribute to the establishment of a pan-European digital platform that makes available language resources and services, encompassing both datasets and software tools, for speech and language process

0 PAPER • NO BENCHMARKS YET

RuSRL

This dataset contains annotations of semantic frames and intra-frame syntax for 1500 Russian sentences. Each sentence is annotated with predicate-argument structures. Syntactic information is also provided for each frame.

0 PAPER • NO BENCHMARKS YET

Datasets

14 dataset results for Semantic Role Labeling