no code implementations • CLASP 2022 • Claudio Greco, Alberto Testoni, Raffaella Bernardi, Stella Frank
Pre-trained Vision and Language Transformers achieve high performance on downstream tasks due to their ability to transfer representational knowledge accumulated during pretraining on substantial amounts of data.
no code implementations • EMNLP (SpLU) 2020 • Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, Raffaella Bernardi
By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.
1 code implementation • COLING 2022 • Michael Hanna, Federico Pedeni, Alessandro Suglia, Alberto Testoni, Raffaella Bernardi
This paves the way for a systematic way of evaluating embodied AI agents that understand grounded actions.
1 code implementation • ACL (splurobonlp) 2021 • Tianai Dong, Alberto Testoni, Luciana Benotti, Raffaella Bernardi
We call the question that restricts the context: trigger, and we call the spatial question that requires the trigger question to be answered: zoomer.
1 code implementation • EMNLP 2021 • Alberto Testoni, Raffaella Bernardi
Inspired by the cognitive literature on information search and cross-situational word learning, we design Confirm-it, a model based on a beam search re-ranking algorithm that guides an effective goal-oriented strategy by asking questions that confirm the model's conjecture about the referent.
no code implementations • ACL 2021 • Alberto Testoni, Raffaella Bernardi
We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.
no code implementations • 20 Mar 2021 • Alberto Testoni, Raffaella Bernardi
Despite important progress, conversational systems often generate dialogues that sound unnatural to humans.
1 code implementation • EACL 2021 • Alberto Testoni, Raffaella Bernardi
When training a model on referential dialogue guessing games, the best model is usually chosen based on its task success.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Sandro Pezzelle, Claudio Greco, Greta Gandolfi, Eleonora Gualdoni, Raffaella Bernardi
This paper introduces BD2BB, a novel language and vision benchmark that requires multimodal models combine complementary information from the two modalities.
no code implementations • WS 2020 • Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti
Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models.
no code implementations • WS 2020 • Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti
Task success is the standard metric used to evaluate these systems.
no code implementations • ACL 2019 • Claudio Greco, Barbara Plank, Raquel Fernández, Raffaella Bernardi
We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA).
no code implementations • WS 2019 • Alberto Testoni, S Pezzelle, ro, Raffaella Bernardi
Inspired by the literature on multisensory integration, we develop a computational model to ground quantifiers in perception.
no code implementations • WS 2019 • Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi
The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke' architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs.
3 code implementations • NAACL 2019 • Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner, Elia Bruni, Barbara Plank, Raffaella Bernardi, Raquel Fernández
We compare our approach to an alternative system which extends the baseline with reinforcement learning.
1 code implementation • COLING 2018 • Hoa Trong Vu, Claudio Greco, Aliia Erofeeva, Somayeh Jafaritazehjan, Guido Linders, Marc Tanti, Alberto Testoni, Raffaella Bernardi, Albert Gatt
Capturing semantic relations between sentences, such as entailment, is a long-standing challenge for computational semantics.
Ranked #2 on Natural Language Inference on V-SNLI
1 code implementation • COLING 2018 • Ravi Shekhar, Tim Baumgartner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi, Raquel Fernandez
We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess.
1 code implementation • NAACL 2018 • Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi
The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model.
no code implementations • ACL 2017 • Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi
In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities.
no code implementations • 10 Apr 2017 • Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi
We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.
no code implementations • EACL 2017 • Sandro Pezzelle, Marco Marelli, Raffaella Bernardi
People can refer to quantities in a visual scene by using either exact cardinals (e. g. one, two, three) or natural language quantifiers (e. g. few, most, all).
2 code implementations • ACL 2016 • Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández
We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task.
no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.
no code implementations • 10 Jun 2015 • Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni
We introduce language-driven image generation, the task of generating an image visualizing the semantic contents of a word embedding, e. g., given the word embedding of grasshopper, we generate a natural image of a grasshopper.
no code implementations • LREC 2014 • Marco Marelli, Stefano Menini, Marco Baroni, Luisa Bentivogli, Raffaella Bernardi, Roberto Zamparelli
Shared and internationally recognized benchmarks are fundamental for the development of any computational system.