Search Results for author: Desmond Elliott

Found 54 papers, 32 papers with code

Probing Cross-Modal Representations in Multi-Step Relational Reasoning

1 code implementation • ACL (RepL4NLP) 2021 • Iuliia Parfenova, Desmond Elliott, Raquel Fernández, Sandro Pezzelle

We investigate the representations learned by vision and language models in tasks that require relational reasoning.

Paper
Code

Sequential Compositional Generalization in Multimodal Models

no code implementations • 18 Apr 2024 • Semih Yagcioglu, Osman Batur İnce, Aykut Erdem, Erkut Erdem, Desmond Elliott, Deniz Yuret

The rise of large-scale multimodal models has paved the pathway for groundbreaking advances in generative modeling and reasoning, unlocking transformative applications in a variety of complex tasks.

Paper
Add Code

Text Rendering Strategies for Pixel Language Models

no code implementations • 1 Nov 2023 • Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott

Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling.

Language Modelling Sentence

Paper
Add Code

Evaluating Bias and Fairness in Gender-Neutral Pretrained Vision-and-Language Models

1 code implementation • 26 Oct 2023 • Laura Cabello, Emanuele Bugliarello, Stephanie Brandl, Desmond Elliott

We quantify bias amplification in pretraining and after fine-tuning on three families of vision-and-language models.

Fairness Retrieval

Paper
Code

PHD: Pixel-Based Language Modeling of Historical Documents

1 code implementation • 22 Oct 2023 • Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein

We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.

Language Modelling Optical Character Recognition (OCR)

Paper
Code

LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting

1 code implementation • 31 May 2023 • Rita Ramos, Bruno Martins, Desmond Elliott

Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process.

Image Captioning Language Modelling +1

Paper
Code

The Role of Data Curation in Image Captioning

1 code implementation • 5 May 2023 • Wenyan Li, Jonas F. Lotz, Chen Qiu, Desmond Elliott

Image captioning models are typically trained by treating all samples equally, neglecting to account for mismatched or otherwise difficult data points.

Few-Shot Learning Image Captioning +2

Paper
Code

Retrieval-augmented Image Captioning

1 code implementation • 16 Feb 2023 • Rita Ramos, Desmond Elliott, Bruno Martins

The encoder in our model jointly processes the image and retrieved captions using a pretrained V&L BERT, while the decoder attends to the multimodal encoder representations, benefiting from the extra textual evidence from the retrieved captions.

Image Captioning Retrieval +1

Paper
Code

Multilingual Multimodal Learning with Machine Translated Text

1 code implementation • 24 Oct 2022 • Chen Qiu, Dan Oneata, Emanuele Bugliarello, Stella Frank, Desmond Elliott

We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model.

Ranked #1 on Zero-Shot Cross-Lingual Text-to-Image Retrieval on WIT (IGLUE)

Zero-Shot Cross-Lingual Image-to-Text Retrieval Zero-Shot Cross-Lingual Text-to-Image Retrieval +3

Paper
Code

An Exploration of Hierarchical Attention Transformers for Efficient Long Document Classification

no code implementations • 11 Oct 2022 • Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott

Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.

Document Classification

Paper
Add Code

SmallCap: Lightweight Image Captioning Prompted with Retrieval Augmentation

1 code implementation • CVPR 2023 • Rita Ramos, Bruno Martins, Desmond Elliott, Yova Kementchedjhieva

Recent advances in image captioning have focused on scaling the data and model size, substantially increasing the cost of pre-training and finetuning.

Image Captioning Retrieval

Paper
Code

Language Modelling with Pixels

1 code implementation • 14 Jul 2022 • Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott

We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts.

Ranked #1 on Named Entity Recognition (NER) on MasakhaNER

Language Modelling Named Entity Recognition (NER)

321

Paper
Code

Revisiting Transformer-based Models for Long Document Classification

1 code implementation • 14 Apr 2022 • Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott

The recent literature in text classification is biased towards short text sequences (e. g., sentences or paragraphs).

Document Classification text-classification

Paper
Code

IGLUE: A Benchmark for Transfer Learning across Modalities, Tasks, and Languages

3 code implementations • 27 Jan 2022 • Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, Ivan Vulić

Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups.

Ranked #1 on Max-Shot Cross-Lingual Visual Question Answering on xGQA

Cross-Modal Retrieval Few-Shot Learning +5

111

Paper
Code

Visually Grounded Reasoning across Languages and Cultures

3 code implementations • EMNLP 2021 • Fangyu Liu, Emanuele Bugliarello, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, Desmond Elliott

The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and images of ImageNet.

Ranked #1 on Zero-Shot Cross-Lingual Transfer on MaRVL

Visual Reasoning Zero-Shot Learning

111

Paper
Code

MDAPT: Multilingual Domain Adaptive Pretraining in a Single Model

1 code implementation • Findings (EMNLP) 2021 • Rasmus Kær Jørgensen, Mareike Hartmann, Xiang Dai, Desmond Elliott

Domain adaptive pretraining, i. e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain.

Language Modelling named-entity-recognition +4

Paper
Code

Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers

4 code implementations • EMNLP 2021 • Stella Frank, Emanuele Bugliarello, Desmond Elliott

Models that have learned to construct cross-modal representations using both modalities are expected to perform worse when inputs are missing from a modality.

Language Modelling

111

Paper
Code

The Role of Syntactic Planning in Compositional Image Captioning

1 code implementation • EACL 2021 • Emanuele Bugliarello, Desmond Elliott

Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images.

Image Captioning

Paper
Code

Multimodal Pretraining Unmasked: A Meta-Analysis and a Unified Framework of Vision-and-Language BERTs

3 code implementations • 30 Nov 2020 • Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, Desmond Elliott

Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks in computer vision and natural language processing.

111

Paper
Code

Multimodal Speech Recognition with Unstructured Audio Masking

no code implementations • EMNLP (nlpbt) 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.

8k Automatic Speech Recognition +2

Paper
Add Code

Textual Supervision for Visually Grounded Spoken Language Understanding

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Bertrand Higy, Desmond Elliott, Grzegorz Chrupała

Visually-grounded models of spoken language understanding extract semantic information directly from speech, without relying on transcriptions.

Spoken Language Understanding

Paper
Code

Fine-Grained Grounding for Multimodal Speech Recognition

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott

In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

On Forgetting to Cite Older Papers: An Analysis of the ACL Anthology

1 code implementation • ACL 2020 • Marcel Bollmann, Desmond Elliott

The field of natural language processing is experiencing a period of unprecedented growth, and with it a surge of published papers.

Paper
Code

CompGuessWhat?!: A Multi-task Evaluation Framework for Grounded Language Learning

no code implementations • ACL 2020 • Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon

To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation.

Attribute Grounded language learning

Paper
Add Code

The Sensitivity of Language Models and Humans to Winograd Schema Perturbations

2 code implementations • ACL 2020 • Mostafa Abdou, Vinit Ravishankar, Maria Barrett, Yonatan Belinkov, Desmond Elliott, Anders Søgaard

Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability.

Common Sense Reasoning

Paper
Code

Multimodal Machine Translation through Visuals and Speech

no code implementations • 28 Nov 2019 • Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann

Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.

Ranked #4 on Multimodal Machine Translation on Multi30K

Image Captioning Multimodal Machine Translation +4

Paper
Add Code

Bootstrapping Disjoint Datasets for Multilingual Multimodal Representation Learning

no code implementations • 9 Nov 2019 • Ákos Kádár, Grzegorz Chrupała, Afra Alishahi, Desmond Elliott

However, we do find that using an external machine translation model to generate the synthetic data sets results in better performance.

Machine Translation Representation Learning +4

Paper
Add Code

Adversarial Removal of Demographic Attributes Revisited

no code implementations • IJCNLP 2019 • Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders S{\o}gaard

Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on.

Paper
Add Code

Understanding the Effect of Textual Adversaries in Multimodal Machine Translation

no code implementations • WS 2019 • Koel Dutta Chowdhury, Desmond Elliott

It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image.

Multimodal Machine Translation Sentence +1

Paper
Add Code

Compositional Generalization in Image Captioning

1 code implementation • CONLL 2019 • Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte, Desmond Elliott

Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts.

Caption Generation Image Captioning +1

Paper
Code

Cross-lingual Visual Verb Sense Disambiguation

1 code implementation • NAACL 2019 • Spandana Gella, Desmond Elliott, Frank Keller

We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9, 504 images annotated with English, German, and Spanish verbs.

Machine Translation Translation

Paper
Code

Talking about other people: an endless range of possibilities

1 code implementation • WS 2018 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen

This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people.

Text Generation

Paper
Code

How2: A Large-scale Dataset for Multimodal Language Understanding

2 code implementations • 1 Nov 2018 • Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze

In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

150

Paper
Code

Adversarial Evaluation of Multimodal Machine Translation

no code implementations • EMNLP 2018 • Desmond Elliott

The promise of combining language and vision in multimodal machine translation is that systems will produce better translations by leveraging the image data.

Multimodal Machine Translation text similarity +1

Paper
Add Code

Findings of the Third Shared Task on Multimodal Machine Translation

1 code implementation • WS 2018 • Lo{\"\i}c Barrault, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, Stella Frank

In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.

Multimodal Machine Translation Sentence +1

159

Paper
Code

Lessons learned in multilingual grounded language learning

1 code implementation • CONLL 2018 • Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz Chrupała, Afra Alishahi

Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language.

Grounded language learning Sentence

Paper
Code

Measuring the Diversity of Automatic Image Descriptions

1 code implementation • COLING 2018 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen

Automatic image description systems typically produce generic sentences that only make use of a small subset of the vocabulary available to them.

Text Generation

Paper
Code

Findings of the Second Shared Task on Multimodal Machine Translation and Multilingual Image Description

no code implementations • WS 2017 • Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia

The multilingual image description task was changed such that at test time, only the image is given.

Multimodal Machine Translation Sentence +1

Paper
Add Code

Cross-linguistic differences and similarities in image descriptions

1 code implementation • WS 2017 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen

Automatic image description systems are commonly trained and evaluated on large image description datasets.

Specificity

Paper
Code

Imagination improves Multimodal Translation

no code implementations • IJCNLP 2017 • Desmond Elliott, Ákos Kádár

We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations.

Translation

Paper
Add Code

Room for improvement in automatic image description: an error analysis

1 code implementation • 13 Apr 2017 • Emiel van Miltenburg, Desmond Elliott

In recent years we have seen rapid and significant progress in automatic image description but what are the open problems in this area?

Paper
Code

DCU-UvA Multimodal MT System Report

no code implementations • WS 2016 • Iacer Calixto, Desmond Elliott, Stella Frank

Multimodal Machine Translation

Paper
Add Code

A Shared Task on Multimodal Machine Translation and Crosslingual Image Description

no code implementations • WS 2016 • Lucia Specia, Stella Frank, Khalil Sima{'}an, Desmond Elliott

Image Retrieval Multimodal Machine Translation +4

Paper
Add Code

Pragmatic factors in image description: the case of negations

1 code implementation • WS 2016 • Emiel van Miltenburg, Roser Morante, Desmond Elliott

We provide a qualitative analysis of the descriptions containing negations (no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses.

Negation

Paper
Code

Multi30K: Multilingual English-German Image Descriptions

1 code implementation • WS 2016 • Desmond Elliott, Stella Frank, Khalil Sima'an, Lucia Specia

We introduce the Multi30K dataset to stimulate multilingual multimodal research.

Multimodal Machine Translation Translation

159

Paper
Code

1 Million Captioned Dutch Newspaper Images

no code implementations • LREC 2016 • Desmond Elliott, Martijn Kleppe

Images naturally appear alongside text in a wide variety of media, such as books, magazines, newspapers, and in online articles.

Data-to-Text Generation Image Captioning +3

Paper
Add Code

A Corpus of Images and Text in Online News

no code implementations • LREC 2016 • Laura Hollink, Adriatik Bedjeti, Martin van Harmelen, Desmond Elliott

The corpus consists of JSON-LD files with the following data about each article: the original URL of the article on the news publisher{'}s website, the date of publication, the headline of the article, the URL of the image displayed with the article (if any), and the caption of that image.

Paper
Add Code

Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures

no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank

Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.

Retrieval

Paper
Add Code

Multilingual Image Description with Neural Sequence Models

1 code implementation • 15 Oct 2015 • Desmond Elliott, Stella Frank, Eva Hasler

In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description.

Image Captioning Translation