Search Results for author: Tommaso Pasini

Found 17 papers, 4 papers with code

Reducing Disambiguation Biases in NMT by Leveraging Explicit Word Sense Information

no code implementations NAACL 2022 Niccolò Campolungo, Tommaso Pasini, Denis Emelin, Roberto Navigli

Recent studies have shed some light on a common pitfall of Neural Machine Translation (NMT) models, stemming from their struggle to disambiguate polysemous words without lapsing into their most frequently occurring senses in the training corpus. In this paper, we first provide a novel approach for automatically creating high-precision sense-annotated parallel corpora, and then put forward a specifically tailored fine-tuning strategy for exploiting these sense annotations during training without introducing any additional requirement at inference time. The use of explicit senses proved to be beneficial to reduce the disambiguation bias of a baseline NMT model, while, at the same time, leading our system to attain higher BLEU scores than its vanilla counterpart in 3 language pairs.

Machine Translation NMT +1

Encoder-Decoder Framework for Interactive Free Verses with Generation with Controllable High-Quality Rhyming

no code implementations8 May 2024 Tommaso Pasini, Alejo López-Ávila, Husam Quteineh, Gerasimos Lampouras, Jinhua Du, Yubing Wang, Ze Li, Yusen Sun

We propose a novel fine-tuning approach that prepends the rhyming word at the start of each lyric, which allows the critical rhyming decision to be made before the model commits to the content of the lyric (as during reverse language modeling), but maintains compatibility with the word order of regular PLMs as the lyric itself is still generated in left-to-right order.

Decoder Language Modelling +1

FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing

1 code implementation ACL 2022 Ilias Chalkidis, Tommaso Pasini, Sheng Zhang, Letizia Tomada, Sebastian Felix Schwemer, Anders Søgaard

We present a benchmark suite of four datasets for evaluating the fairness of pre-trained language models and the techniques used to fine-tune them for downstream tasks.


ESC: Redesigning WSD with Extractive Sense Comprehension

1 code implementation NAACL 2021 Edoardo Barba, Tommaso Pasini, Roberto Navigli

By means of an extensive array of experiments, we show that ESC unleashes the full potential of our model, leading it to outdo all of its competitors and to set a new state of the art on the English WSD task.

Multi-Label Classification Sentence +1

Wikipedia Entities as Rendezvous across Languages: Grounding Multilingual Language Models by Predicting Wikipedia Hyperlinks

no code implementations NAACL 2021 Iacer Calixto, Alessandro Raganato, Tommaso Pasini

Further adding extra languages lead to improvements in most tasks up to a certain point, but overall we found it non-trivial to scale improvements in model transferability by training on ever increasing amounts of Wikipedia languages.

Knowledge Graphs

Sense-Annotated Corpora for Word Sense Disambiguation in Multiple Languages and Domains

no code implementations LREC 2020 Bianca Scarlini, Tommaso Pasini, Roberto Navigli

This limits the range of action of deep-learning approaches, which today are at the base of any NLP task and are hungry for data.

Word Sense Disambiguation

Just ``OneSeC'' for Producing Multilingual Sense-Annotated Data

no code implementations ACL 2019 Bianca Scarlini, Tommaso Pasini, Roberto Navigli

The well-known problem of knowledge acquisition is one of the biggest issues in Word Sense Disambiguation (WSD), where annotated data are still scarce in English and almost absent in other languages.

Word Sense Disambiguation

Huge Automatically Extracted Training Sets for Multilingual Word Sense Disambiguation

no code implementations12 May 2018 Tommaso Pasini, Francesco Maria Elia, Roberto Navigli

We release to the community six large-scale sense-annotated datasets in multiple language to pave the way for supervised multilingual Word Sense Disambiguation.

Word Sense Disambiguation

A Short Survey on Sense-Annotated Corpora

no code implementations LREC 2020 Tommaso Pasini, Jose Camacho-Collados

Large sense-annotated datasets are increasingly necessary for training deep supervised systems in Word Sense Disambiguation.

Word Sense Disambiguation

Cannot find the paper you are looking for? You can Submit a new open access paper.