MultiTACRED Dataset | Papers With Code

Name:*

Full name (optional):

Description (Markdown and $\LaTeX$ enabled):*

MultiTACRED is a multilingual version of the large-scale 
[TAC Relation Extraction Dataset](https://nlp.stanford.edu/projects/tacred). It covers 12 typologically diverse 
languages from 9 language families, and was created by the 
[Speech & Language Technology group of DFKI](https://www.dfki.de/slt) by machine-translating the instances of the 
original TACRED dataset and automatically projecting their entity annotations. For details of the original TACRED's 
data collection and annotation process, see the [Stanford paper](https://aclanthology.org/D17-1004/). Translations are 
syntactically validated by checking the correctness of the XML tag markup. Any translations with an invalid tag 
structure, e.g. missing or invalid head or tail tag pairs, are discarded (on average, 2.3% of the instances).

Languages covered are: Arabic, Chinese, Finnish, French, German, Hindi, Hungarian, Japanese, Polish,
 Russian, Spanish, Turkish. Intended use is supervised relation classification. Audience - researchers.

Please see [our ACL paper](https://arxiv.org/abs/2305.04582) for full details.

Homepage URL (optional):

Paper where the dataset was introduced:

Introduction date:

Dataset license:

URL to full license terms:

Image

Currently

datasets/139671d1-5547-4cc0-be0d-0b4f046b9f1c.png Clear

Change

---

MultiTACRED

Benchmarks

Add a new result Link an existing benchmark

Papers

Dataset Loaders

Add Remove

Tasks

Usage

License

Modalities

Languages

MultiTACRED

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit