Search Results for author: Raffaella Bernardi

Found 40 papers, 9 papers with code

A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game

no code implementations • CLASP 2022 • Claudio Greco, Alberto Testoni, Raffaella Bernardi, Stella Frank

Pre-trained Vision and Language Transformers achieve high performance on downstream tasks due to their ability to transfer representational knowledge accumulated during pretraining on substantial amounts of data.

Paper
Add Code

They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies

no code implementations • EMNLP (SpLU) 2020 • Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, Raffaella Bernardi

By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

Paper
Add Code

ACT-Thor: A Controlled Benchmark for Embodied Action Understanding in Simulated Environments

1 code implementation • COLING 2022 • Michael Hanna, Federico Pedeni, Alessandro Suglia, Alberto Testoni, Raffaella Bernardi

This paves the way for a systematic way of evaluating embodied AI agents that understand grounded actions.

Action Understanding

Paper
Code

Visually Grounded Follow-up Questions: a Dataset of Spatial Questions Which Require Dialogue History

1 code implementation • ACL (splurobonlp) 2021 • Tianai Dong, Alberto Testoni, Luciana Benotti, Raffaella Bernardi

We call the question that restricts the context: trigger, and we call the spatial question that requires the trigger question to be answered: zoomer.

Paper
Code

Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy

1 code implementation • EMNLP 2021 • Alberto Testoni, Raffaella Bernardi

Inspired by the cognitive literature on information search and cross-situational word learning, we design Confirm-it, a model based on a beam search re-ranking algorithm that guides an effective goal-oriented strategy by asking questions that confirm the model's conjecture about the referent.

Re-Ranking

Paper
Code

``I've Seen Things You People Wouldn't Believe'': Hallucinating Entities in GuessWhat?!

no code implementations • ACL 2021 • Alberto Testoni, Raffaella Bernardi

We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.

Hallucination Image Captioning +1

Paper
Add Code

Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training

no code implementations • 20 Mar 2021 • Alberto Testoni, Raffaella Bernardi

Despite important progress, conversational systems often generate dialogues that sound unnatural to humans.

Paper
Add Code

The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues

1 code implementation • EACL 2021 • Alberto Testoni, Raffaella Bernardi

When training a model on referential dialogue guessing games, the best model is usually chosen based on its task success.

155

Paper
Code

Be Different to Be Better! A Benchmark to Leverage the Complementarity of Language and Vision

no code implementations • Findings of the Association for Computational Linguistics 2020 • Sandro Pezzelle, Claudio Greco, Greta Gandolfi, Eleonora Gualdoni, Raffaella Bernardi

This paper introduces BD2BB, a novel language and vision benchmark that requires multimodal models combine complementary information from the two modalities.

Paper
Add Code

On the role of effective and referring questions in GuessWhat?!

no code implementations • WS 2020 • Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti

Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models.

Paper
Add Code

Effective questions in referential visual dialogue

no code implementations • WS 2020 • Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti

Task success is the standard metric used to evaluate these systems.

Visual Dialog

Paper
Add Code

Psycholinguistics meets Continual Learning: Measuring Catastrophic Forgetting in Visual Question Answering

no code implementations • ACL 2019 • Claudio Greco, Barbara Plank, Raquel Fernández, Raffaella Bernardi

We study the issue of catastrophic forgetting in the context of neural multimodal approaches to Visual Question Answering (VQA).

Continual Learning Question Answering +1

Paper
Add Code

Quantifiers in a Multimodal World: Hallucinating Vision with Language and Sound

no code implementations • WS 2019 • Alberto Testoni, S Pezzelle, ro, Raffaella Bernardi

Inspired by the literature on multisensory integration, we develop a computational model to ground quantifiers in perception.

Paper
Add Code

Evaluating the Representational Hub of Language and Vision Models

no code implementations • WS 2019 • Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi

The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke' architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs.

Question Answering Visual Question Answering

Paper
Add Code

Beyond task success: A closer look at jointly learning to see, ask, and GuessWhat

3 code implementations • NAACL 2019 • Ravi Shekhar, Aashish Venkatesh, Tim Baumgärtner, Elia Bruni, Barbara Plank, Raffaella Bernardi, Raquel Fernández

We compare our approach to an alternative system which extends the baseline with reinforcement learning.

Multi-Task Learning Visual Grounding

Paper
Code

Grounded Textual Entailment

1 code implementation • COLING 2018 • Hoa Trong Vu, Claudio Greco, Aliia Erofeeva, Somayeh Jafaritazehjan, Guido Linders, Marc Tanti, Alberto Testoni, Raffaella Bernardi, Albert Gatt

Capturing semantic relations between sentences, such as entailment, is a long-standing challenge for computational semantics.

Ranked #2 on Natural Language Inference on V-SNLI

Natural Language Inference

Paper
Code

Ask No More: Deciding when to guess in referential visual dialogue

1 code implementation • COLING 2018 • Ravi Shekhar, Tim Baumgartner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi, Raquel Fernandez

We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess.

Decision Making Visual Dialog

Paper
Code

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

1 code implementation • NAACL 2018 • Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi

The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model.

Paper
Code

FOIL it! Find One mismatch between Image and Language caption

no code implementations • ACL 2017 • Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities.

Paper
Add Code

Pay Attention to Those Sets! Learning Quantification from Images

no code implementations • 10 Apr 2017 • Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi

We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.

Paper
Add Code

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

no code implementations • EACL 2017 • Sandro Pezzelle, Marco Marelli, Raffaella Bernardi

People can refer to quantities in a visual scene by using either exact cardinals (e. g. one, two, three) or natural language quantifiers (e. g. few, most, all).

Paper
Add Code

Vision and Language Integration: Moving beyond Objects

no code implementations • WS 2017 • Ravi Shekhar, S Pezzelle, ro, Aur{\'e}lie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

Action Classification Image Captioning +2

Paper
Add Code

Can You See the (Linguistic) Difference? Exploring Mass/Count Distinction in Vision

no code implementations • WS 2017 • David Addison Smith, S Pezzelle, ro, Francesca Franzon, Chiara Zanini, Raffaella Bernardi

Paper
Add Code

There Is No Logical Negation Here, But There Are Alternatives: Modeling Conversational Negation with Distributional Semantics

no code implementations • CL 2016 • Germ{\'a}n Kruszewski, Denis Paperno, Raffaella Bernardi, Marco Baroni

Negation

Paper
Add Code

Building a Bagpipe with a Bag and a Pipe: Exploring Conceptual Combination in Vision

no code implementations • WS 2016 • S Pezzelle, ro, Ravi Shekhar, Raffaella Bernardi

Paper
Add Code

``Look, some Green Circles!'': Learning to Quantify from Images

no code implementations • WS 2016 • Ionut Sorodoc, Angeliki Lazaridou, Gemma Boleda, Aur{\'e}lie Herbelot, S Pezzelle, ro, Raffaella Bernardi

Question Answering Visual Question Answering (VQA)

Paper
Add Code

The LAMBADA dataset: Word prediction requiring a broad discourse context

2 code implementations • ACL 2016 • Denis Paperno, Germán Kruszewski, Angeliki Lazaridou, Quan Ngoc Pham, Raffaella Bernardi, Sandro Pezzelle, Marco Baroni, Gemma Boleda, Raquel Fernández

We introduce LAMBADA, a dataset to evaluate the capabilities of computational models for text understanding by means of a word prediction task.

LAMBADA Sentence

Paper
Code

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.

Retrieval

Paper
Add Code

Distributional Semantics in Use

no code implementations • WS 2015 • Raffaella Bernardi, Gemma Boleda, Raquel Fern{\'a}ndez, Denis Paperno

Paper
Add Code

Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation

no code implementations • 10 Jun 2015 • Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni

We introduce language-driven image generation, the task of generating an image visualizing the semantic contents of a word embedding, e. g., given the word embedding of grasshopper, we generate a natural image of a grasshopper.

Image Generation Word Embeddings