Search Results for author: Raffaella Bernardi

Found 40 papers, 9 papers with code

A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game

no code implementations CLASP 2022 Claudio Greco, Alberto Testoni, Raffaella Bernardi, Stella Frank

Pre-trained Vision and Language Transformers achieve high performance on downstream tasks due to their ability to transfer representational knowledge accumulated during pretraining on substantial amounts of data.

They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies

no code implementations EMNLP (SpLU) 2020 Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, Raffaella Bernardi

By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

Visually Grounded Follow-up Questions: a Dataset of Spatial Questions Which Require Dialogue History

1 code implementation ACL (splurobonlp) 2021 Tianai Dong, Alberto Testoni, Luciana Benotti, Raffaella Bernardi

We call the question that restricts the context: trigger, and we call the spatial question that requires the trigger question to be answered: zoomer.

Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy

1 code implementation EMNLP 2021 Alberto Testoni, Raffaella Bernardi

Inspired by the cognitive literature on information search and cross-situational word learning, we design Confirm-it, a model based on a beam search re-ranking algorithm that guides an effective goal-oriented strategy by asking questions that confirm the model's conjecture about the referent.

Re-Ranking

``I've Seen Things You People Wouldn't Believe'': Hallucinating Entities in GuessWhat?!

no code implementations ACL 2021 Alberto Testoni, Raffaella Bernardi

We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.

Hallucination Image Captioning +1

Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training

no code implementations20 Mar 2021 Alberto Testoni, Raffaella Bernardi

Despite important progress, conversational systems often generate dialogues that sound unnatural to humans.

The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues

1 code implementation EACL 2021 Alberto Testoni, Raffaella Bernardi

When training a model on referential dialogue guessing games, the best model is usually chosen based on its task success.

On the role of effective and referring questions in GuessWhat?!

no code implementations WS 2020 Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti

Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models.

Quantifiers in a Multimodal World: Hallucinating Vision with Language and Sound

no code implementations WS 2019 Alberto Testoni, S Pezzelle, ro, Raffaella Bernardi

Inspired by the literature on multisensory integration, we develop a computational model to ground quantifiers in perception.

Evaluating the Representational Hub of Language and Vision Models

no code implementations WS 2019 Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi

The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke' architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs.

Question Answering Visual Question Answering

Ask No More: Deciding when to guess in referential visual dialogue

1 code implementation COLING 2018 Ravi Shekhar, Tim Baumgartner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi, Raquel Fernandez

We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess.

Decision Making Visual Dialog

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

1 code implementation NAACL 2018 Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi

The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model.

FOIL it! Find One mismatch between Image and Language caption

no code implementations ACL 2017 Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities.

Pay Attention to Those Sets! Learning Quantification from Images

no code implementations10 Apr 2017 Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi

We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

no code implementations EACL 2017 Sandro Pezzelle, Marco Marelli, Raffaella Bernardi

People can refer to quantities in a visual scene by using either exact cardinals (e. g. one, two, three) or natural language quantifiers (e. g. few, most, all).

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations15 Jan 2016 Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.

Retrieval

Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation

no code implementations10 Jun 2015 Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni

We introduce language-driven image generation, the task of generating an image visualizing the semantic contents of a word embedding, e. g., given the word embedding of grasshopper, we generate a natural image of a grasshopper.

Image Generation Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.