EUROPA is a dataset designed for training and evaluating multilingual keyphrase generation models in the legal domain. It consists of legal judgments from the Court of Justice of the European Union (EU) and includes instances in all 24 official EU languages.
Key Features: Multilingual: Covers 24 official EU languages. Domain-Specific: Focuses on legal documents. Source: Derived from Court of Justice of the European Union judgments.
lang
and celex_id
values;As explained in our paper, the dataset is split chronologically for assessing temporal generalization of models: - training set: 1957 to 2010 (131 076 instances); - validation set: 2011 to 2015 (63 373 instances); - test set: 2016 to 2023 (90 508 instances).
@article{salaun2024europa,
title={EUROPA: A Legal Multilingual Keyphrase Generation Dataset},
author={Sala{\"u}n, Olivier and Piedboeuf, Fr{\'e}d{\'e}ric and Le Berre, Guillaume and Hermelo, David Alfonso and Langlais, Philippe},
journal={arXiv preprint arXiv:2403.00252},
year={2024}
}
Paper | Code | Results | Date | Stars |
---|