Search Results for author: Raffaella Bernardi

Found 48 papers, 15 papers with code

A Small but Informed and Diverse Model: The Case of the Multimodal GuessWhat!? Guessing Game

no code implementations CLASP 2022 Claudio Greco, Alberto Testoni, Raffaella Bernardi, Stella Frank

Pre-trained Vision and Language Transformers achieve high performance on downstream tasks due to their ability to transfer representational knowledge accumulated during pretraining on substantial amounts of data.

Diversity

Visually Grounded Follow-up Questions: a Dataset of Spatial Questions Which Require Dialogue History

1 code implementation ACL (splurobonlp) 2021 Tianai Dong, Alberto Testoni, Luciana Benotti, Raffaella Bernardi

We call the question that restricts the context: trigger, and we call the spatial question that requires the trigger question to be answered: zoomer.

They Are Not All Alike: Answering Different Spatial Questions Requires Different Grounding Strategies

no code implementations EMNLP (SpLU) 2020 Alberto Testoni, Claudio Greco, Tobias Bianchi, Mauricio Mazuecos, Agata Marcante, Luciana Benotti, Raffaella Bernardi

By analyzing LXMERT errors and its attention mechanisms, we find that our classification helps to gain a better understanding of the skills required to answer different spatial questions.

All

A MIND for Reasoning: Meta-learning for In-context Deduction

1 code implementation20 May 2025 Leonardo Bertolazzi, Manuel Vargas Guzmán, Raffaella Bernardi, Maciej Malicki, Jakub Szymanik

The goal of MIND is to enable models to generalize more effectively to unseen knowledge bases and to systematically apply inference rules.

Meta-Learning

All-in-one: Understanding and Generation in Multimodal Reasoning with the MAIA Benchmark

no code implementations24 Feb 2025 Davide Testa, Giovanni Bonetta, Raffaella Bernardi, Alessandro Bondielli, Alessandro Lenci, Alessio Miaschi, Lucia Passaro, Bernardo Magnini

We introduce MAIA (Multimodal AI Assessment), a native-Italian benchmark designed for fine-grained investigation of the reasoning abilities of visual language models on videos.

All Multimodal Reasoning +2

Triangulating LLM Progress through Benchmarks, Games, and Cognitive Tests

no code implementations20 Feb 2025 Filippo Momentè, Alessandro Suglia, Mario Giulianelli, Ambra Ferrari, Alexander Koller, Oliver Lemon, David Schlangen, Raquel Fernández, Raffaella Bernardi

We examine three evaluation paradigms: large question-answering benchmarks (e. g., MMLU and BBH), interactive games (e. g., Signalling Games or Taboo), and cognitive tests (e. g., for working memory or theory of mind).

Logical Reasoning MMLU +1

The Validation Gap: A Mechanistic Analysis of How Language Models Compute Arithmetic but Fail to Validate It

1 code implementation17 Feb 2025 Leonardo Bertolazzi, Philipp Mondorf, Barbara Plank, Raffaella Bernardi

The ability of large language models (LLMs) to validate their output and identify potential errors is crucial for ensuring robustness and reliability.

Looking for Confirmations: An Effective and Human-Like Visual Dialogue Strategy

1 code implementation EMNLP 2021 Alberto Testoni, Raffaella Bernardi

Inspired by the cognitive literature on information search and cross-situational word learning, we design Confirm-it, a model based on a beam search re-ranking algorithm that guides an effective goal-oriented strategy by asking questions that confirm the model's conjecture about the referent.

Re-Ranking

``I've Seen Things You People Wouldn't Believe'': Hallucinating Entities in GuessWhat?!

no code implementations ACL 2021 Alberto Testoni, Raffaella Bernardi

We also analyse where hallucinations tend to occur more often through the dialogue: hallucinations are less frequent in earlier turns, cause a cascade hallucination effect, and are often preceded by negative answers, which have been shown to be harder to ground.

Hallucination Image Captioning +2

The Interplay of Task Success and Dialogue Quality: An in-depth Evaluation in Task-Oriented Visual Dialogues

1 code implementation EACL 2021 Alberto Testoni, Raffaella Bernardi

When training a model on referential dialogue guessing games, the best model is usually chosen based on its task success.

Overprotective Training Environments Fall Short at Testing Time: Let Models Contribute to Their Own Training

no code implementations20 Mar 2021 Alberto Testoni, Raffaella Bernardi

Despite important progress, conversational systems often generate dialogues that sound unnatural to humans.

On the role of effective and referring questions in GuessWhat?!

no code implementations WS 2020 Mauricio Mazuecos, Alberto Testoni, Raffaella Bernardi, Luciana Benotti

Regarding our first metric, we find that successful dialogues do not have a higher percentage of effective questions for most models.

Quantifiers in a Multimodal World: Hallucinating Vision with Language and Sound

no code implementations WS 2019 Alberto Testoni, S Pezzelle, ro, Raffaella Bernardi

Inspired by the literature on multisensory integration, we develop a computational model to ground quantifiers in perception.

Evaluating the Representational Hub of Language and Vision Models

no code implementations WS 2019 Ravi Shekhar, Ece Takmaz, Raquel Fernández, Raffaella Bernardi

The multimodal models used in the emerging field at the intersection of computational linguistics and computer vision implement the bottom-up processing of the `Hub and Spoke' architecture proposed in cognitive science to represent how the brain processes and combines multi-sensory inputs.

Diagnostic Question Answering +1

Ask No More: Deciding when to guess in referential visual dialogue

1 code implementation COLING 2018 Ravi Shekhar, Tim Baumgartner, Aashish Venkatesh, Elia Bruni, Raffaella Bernardi, Raquel Fernandez

We make initial steps towards this general goal by augmenting a task-oriented visual dialogue model with a decision-making component that decides whether to ask a follow-up question to identify a target referent in an image, or to stop the conversation to make a guess.

Decision Making Visual Dialog

Comparatives, Quantifiers, Proportions: A Multi-Task Model for the Learning of Quantities from Vision

1 code implementation NAACL 2018 Sandro Pezzelle, Ionut-Teodor Sorodoc, Raffaella Bernardi

The present work investigates whether different quantification mechanisms (set comparison, vague quantification, and proportional estimation) can be jointly learned from visual scenes by a multi-task computational model.

FOIL it! Find One mismatch between Image and Language caption

no code implementations ACL 2017 Ravi Shekhar, Sandro Pezzelle, Yauhen Klimovich, Aurelie Herbelot, Moin Nabi, Enver Sangineto, Raffaella Bernardi

In this paper, we aim to understand whether current language and vision (LaVi) models truly grasp the interaction between the two modalities.

Pay Attention to Those Sets! Learning Quantification from Images

no code implementations10 Apr 2017 Ionut Sorodoc, Sandro Pezzelle, Aurélie Herbelot, Mariella Dimiccoli, Raffaella Bernardi

We however argue that precisely identifying the composition of the sets is not only beyond current state-of-the-art models but perhaps even detrimental to a task that is most efficiently performed by refining the approximate numerosity estimator of the system.

Be Precise or Fuzzy: Learning the Meaning of Cardinals and Quantifiers from Vision

no code implementations EACL 2017 Sandro Pezzelle, Marco Marelli, Raffaella Bernardi

People can refer to quantities in a visual scene by using either exact cardinals (e. g. one, two, three) or natural language quantifiers (e. g. few, most, all).

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations15 Jan 2016 Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.

Retrieval

Unveiling the Dreams of Word Embeddings: Towards Language-Driven Image Generation

no code implementations10 Jun 2015 Angeliki Lazaridou, Dat Tien Nguyen, Raffaella Bernardi, Marco Baroni

We introduce language-driven image generation, the task of generating an image visualizing the semantic contents of a word embedding, e. g., given the word embedding of grasshopper, we generate a natural image of a grasshopper.

Image Generation Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.