The PARSEME Shared Task on Automatic Identification of Verbal Multiword Expressions
Multiword expressions (MWEs) are known as a {``}pain in the neck{''} for NLP due to their idiosyncratic behaviour. While some categories of MWEs have been addressed by many studies, verbal MWEs (VMWEs), such as to take a decision, to break one{'}s heart or to turn off, have been rarely modelled. This is notably due to their syntactic variability, which hinders treating them as {``}words with spaces{''}. We describe an initiative meant to bring about substantial progress in understanding, modelling and processing VMWEs. It is a joint effort, carried out within a European research network, to elaborate universal terminologies and annotation guidelines for 18 languages. Its main outcome is a multilingual 5-million-word annotated corpus which underlies a shared task on automatic identification of VMWEs. This paper presents the corpus annotation methodology and outcome, the shared task organisation and the results of the participating systems.
PDF Abstract