1 code implementation • ACL (RepL4NLP) 2021 • Iuliia Parfenova, Desmond Elliott, Raquel Fernández, Sandro Pezzelle
We investigate the representations learned by vision and language models in tasks that require relational reasoning.
no code implementations • 18 Oct 2024 • Constanza Fierro, Negar Foroutan, Desmond Elliott, Anders Søgaard
Large Language Models (LLMs) store and retrieve vast amounts of factual knowledge acquired during pre-training.
no code implementations • 16 Oct 2024 • Niels Horn, Desmond Elliott
We study how features emerge, disappear, and persist across models fine-tuned on different domains of text.
1 code implementation • 30 Sep 2024 • Vincent Beliveau, Helene Kaas, Martin Prener, Claes N. Ladefoged, Desmond Elliott, Gitte M. Knudsen, Lars H. Pinborg, Melanie Ganz
We evaluated a set of NLP models including BERT-like transformers, few-shot learning with sentence transformers (SetFit), and prompted large language models (LLM), using three datasets of radiology reports on magnetic resonance images of epilepsy patients in Danish, a low-resource language.
1 code implementation • 3 Sep 2024 • Ingo Ziegler, Abdullatif Köksal, Desmond Elliott, Hinrich Schütze
Building high-quality datasets for specialized tasks is a time-consuming and resource-intensive process that often requires specialized domain knowledge.
1 code implementation • 26 Jun 2024 • Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, André F. T. Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni
There is an increasing trend towards evaluating NLP models with LLMs instead of human judgments, raising questions about the validity of these evaluations, as well as their reproducibility in the case of proprietary models.
1 code implementation • 16 Jun 2024 • Wenyan Li, Xinyu Zhang, Jiaang Li, Qiwei Peng, Raphael Tang, Li Zhou, Weijia Zhang, Guimin Hu, Yifei Yuan, Anders Søgaard, Daniel Hershcovich, Desmond Elliott
Food is a rich and varied dimension of cultural heritage, crucial to both individuals and social groups.
1 code implementation • 4 Jun 2024 • Wenyan Li, Jiaang Li, Rita Ramos, Raphael Tang, Desmond Elliott
Recent advances in retrieval-augmented models for image captioning highlight the benefit of retrieving related captions for efficient, lightweight models with strong domain-transfer capabilities.
no code implementations • 18 Apr 2024 • Semih Yagcioglu, Osman Batur İnce, Aykut Erdem, Erkut Erdem, Desmond Elliott, Deniz Yuret
The rise of large-scale multimodal models has paved the pathway for groundbreaking advances in generative modeling and reasoning, unlocking transformative applications in a variety of complex tasks.
no code implementations • 1 Nov 2023 • Jonas F. Lotz, Elizabeth Salesky, Phillip Rust, Desmond Elliott
Pixel-based language models process text rendered as images, which allows them to handle any script, making them a promising approach to open vocabulary language modelling.
1 code implementation • 26 Oct 2023 • Laura Cabello, Emanuele Bugliarello, Stephanie Brandl, Desmond Elliott
We quantify bias amplification in pretraining and after fine-tuning on three families of vision-and-language models.
1 code implementation • 22 Oct 2023 • Nadav Borenstein, Phillip Rust, Desmond Elliott, Isabelle Augenstein
We then pre-train our model, PHD, on a combination of synthetic scans and real historical newspapers from the 1700-1900 period.
1 code implementation • 31 May 2023 • Rita Ramos, Bruno Martins, Desmond Elliott
Multilingual image captioning has recently been tackled by training with large-scale machine translated data, which is an expensive, noisy, and time-consuming process.
1 code implementation • 5 May 2023 • Wenyan Li, Jonas F. Lotz, Chen Qiu, Desmond Elliott
Image captioning models are typically trained by treating all samples equally, neglecting to account for mismatched or otherwise difficult data points.
1 code implementation • 16 Feb 2023 • Rita Ramos, Desmond Elliott, Bruno Martins
The encoder in our model jointly processes the image and retrieved captions using a pretrained V&L BERT, while the decoder attends to the multimodal encoder representations, benefiting from the extra textual evidence from the retrieved captions.
1 code implementation • 24 Oct 2022 • Chen Qiu, Dan Oneata, Emanuele Bugliarello, Stella Frank, Desmond Elliott
We call this framework TD-MML: Translated Data for Multilingual Multimodal Learning, and it can be applied to any multimodal dataset and model.
Zero-Shot Cross-Lingual Image-to-Text Retrieval
Zero-Shot Cross-Lingual Text-to-Image Retrieval
+3
no code implementations • 11 Oct 2022 • Ilias Chalkidis, Xiang Dai, Manos Fergadiotis, Prodromos Malakasiotis, Desmond Elliott
Non-hierarchical sparse attention Transformer-based models, such as Longformer and Big Bird, are popular approaches to working with long documents.
1 code implementation • CVPR 2023 • Rita Ramos, Bruno Martins, Desmond Elliott, Yova Kementchedjhieva
Recent advances in image captioning have focused on scaling the data and model size, substantially increasing the cost of pre-training and finetuning.
1 code implementation • 14 Jul 2022 • Phillip Rust, Jonas F. Lotz, Emanuele Bugliarello, Elizabeth Salesky, Miryam de Lhoneux, Desmond Elliott
We pretrain the 86M parameter PIXEL model on the same English data as BERT and evaluate on syntactic and semantic tasks in typologically diverse languages, including various non-Latin scripts.
Ranked #1 on
Named Entity Recognition (NER)
on MasakhaNER
1 code implementation • 14 Apr 2022 • Xiang Dai, Ilias Chalkidis, Sune Darkner, Desmond Elliott
The recent literature in text classification is biased towards short text sequences (e. g., sentences or paragraphs).
3 code implementations • 27 Jan 2022 • Emanuele Bugliarello, Fangyu Liu, Jonas Pfeiffer, Siva Reddy, Desmond Elliott, Edoardo Maria Ponti, Ivan Vulić
Our benchmark enables the evaluation of multilingual multimodal models for transfer learning, not only in a zero-shot setting, but also in newly defined few-shot learning setups.
3 code implementations • EMNLP 2021 • Fangyu Liu, Emanuele Bugliarello, Edoardo Maria Ponti, Siva Reddy, Nigel Collier, Desmond Elliott
The design of widespread vision-and-language datasets and pre-trained encoders directly adopts, or draws inspiration from, the concepts and images of ImageNet.
Ranked #1 on
Zero-Shot Cross-Lingual Transfer
on MaRVL
1 code implementation • Findings (EMNLP) 2021 • Rasmus Kær Jørgensen, Mareike Hartmann, Xiang Dai, Desmond Elliott
Domain adaptive pretraining, i. e. the continued unsupervised pretraining of a language model on domain-specific text, improves the modelling of text for downstream tasks within the domain.
4 code implementations • EMNLP 2021 • Stella Frank, Emanuele Bugliarello, Desmond Elliott
Models that have learned to construct cross-modal representations using both modalities are expected to perform worse when inputs are missing from a modality.
1 code implementation • EACL 2021 • Emanuele Bugliarello, Desmond Elliott
Image captioning has focused on generalizing to images drawn from the same distribution as the training set, and not to the more challenging problem of generalizing to different distributions of images.
3 code implementations • 30 Nov 2020 • Emanuele Bugliarello, Ryan Cotterell, Naoaki Okazaki, Desmond Elliott
Large-scale pretraining and task-specific fine-tuning is now the standard methodology for many tasks in computer vision and natural language processing.
no code implementations • EMNLP (nlpbt) 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
Our experiments on the Flickr 8K Audio Captions Corpus show that multimodal ASR can generalize to recover different types of masked words in this unstructured masking setting.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Bertrand Higy, Desmond Elliott, Grzegorz Chrupała
Visually-grounded models of spoken language understanding extract semantic information directly from speech, without relying on transcriptions.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Tejas Srinivasan, Ramon Sanabria, Florian Metze, Desmond Elliott
In experiments on the Flickr8K Audio Captions Corpus, we find that our model improves over approaches that use global visual features, that the proposals enable the model to recover entities and other related words, such as adjectives, and that improvements are due to the model's ability to localize the correct proposals.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • ACL 2020 • Marcel Bollmann, Desmond Elliott
The field of natural language processing is experiencing a period of unprecedented growth, and with it a surge of published papers.
no code implementations • ACL 2020 • Alessandro Suglia, Ioannis Konstas, Andrea Vanzo, Emanuele Bastianelli, Desmond Elliott, Stella Frank, Oliver Lemon
To remedy this, we present GROLLA, an evaluation framework for Grounded Language Learning with Attributes with three sub-tasks: 1) Goal-oriented evaluation; 2) Object attribute prediction evaluation; and 3) Zero-shot evaluation.
2 code implementations • ACL 2020 • Mostafa Abdou, Vinit Ravishankar, Maria Barrett, Yonatan Belinkov, Desmond Elliott, Anders Søgaard
Large-scale pretrained language models are the major driving force behind recent improvements in performance on the Winograd Schema Challenge, a widely employed test of common sense reasoning ability.
no code implementations • 28 Nov 2019 • Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.
Ranked #4 on
Multimodal Machine Translation
on Multi30K
no code implementations • 9 Nov 2019 • Ákos Kádár, Grzegorz Chrupała, Afra Alishahi, Desmond Elliott
However, we do find that using an external machine translation model to generate the synthetic data sets results in better performance.
no code implementations • WS 2019 • Koel Dutta Chowdhury, Desmond Elliott
It is assumed that multimodal machine translation systems are better than text-only systems at translating phrases that have a direct correspondence in the image.
no code implementations • IJCNLP 2019 • Maria Barrett, Yova Kementchedjhieva, Yanai Elazar, Desmond Elliott, Anders S{\o}gaard
Elazar and Goldberg (2018) showed that protected attributes can be extracted from the representations of a debiased neural network for mention detection at above-chance levels, by evaluating a diagnostic classifier on a held-out subsample of the data it was trained on.
1 code implementation • CONLL 2019 • Mitja Nikolaus, Mostafa Abdou, Matthew Lamm, Rahul Aralikatte, Desmond Elliott
Image captioning models are usually evaluated on their ability to describe a held-out set of images, not on their ability to generalize to unseen concepts.
1 code implementation • NAACL 2019 • Spandana Gella, Desmond Elliott, Frank Keller
We extend this line of work to the more challenging task of cross-lingual verb sense disambiguation, introducing the MultiSense dataset of 9, 504 images annotated with English, German, and Spanish verbs.
2 code implementations • 1 Nov 2018 • Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze
In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • WS 2018 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen
This taxonomy serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people.
1 code implementation • WS 2018 • Lo{\"\i}c Barrault, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, Stella Frank
In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.
no code implementations • EMNLP 2018 • Desmond Elliott
The promise of combining language and vision in multimodal machine translation is that systems will produce better translations by leveraging the image data.
1 code implementation • CONLL 2018 • Ákos Kádár, Desmond Elliott, Marc-Alexandre Côté, Grzegorz Chrupała, Afra Alishahi
Recent work has shown how to learn better visual-semantic embeddings by leveraging image descriptions in more than one language.
1 code implementation • COLING 2018 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen
Automatic image description systems typically produce generic sentences that only make use of a small subset of the vocabulary available to them.
no code implementations • WS 2017 • Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia
The multilingual image description task was changed such that at test time, only the image is given.
1 code implementation • WS 2017 • Emiel van Miltenburg, Desmond Elliott, Piek Vossen
Automatic image description systems are commonly trained and evaluated on large image description datasets.
no code implementations • IJCNLP 2017 • Desmond Elliott, Ákos Kádár
We decompose multimodal translation into two sub-tasks: learning to translate and learning visually grounded representations.
1 code implementation • 13 Apr 2017 • Emiel van Miltenburg, Desmond Elliott
In recent years we have seen rapid and significant progress in automatic image description but what are the open problems in this area?
1 code implementation • WS 2016 • Emiel van Miltenburg, Roser Morante, Desmond Elliott
We provide a qualitative analysis of the descriptions containing negations (no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses.
2 code implementations • WS 2016 • Desmond Elliott, Stella Frank, Khalil Sima'an, Lucia Specia
We introduce the Multi30K dataset to stimulate multilingual multimodal research.
no code implementations • LREC 2016 • Laura Hollink, Adriatik Bedjeti, Martin van Harmelen, Desmond Elliott
The corpus consists of JSON-LD files with the following data about each article: the original URL of the article on the news publisher{'}s website, the date of publication, the headline of the article, the URL of the image displayed with the article (if any), and the caption of that image.
no code implementations • LREC 2016 • Desmond Elliott, Martijn Kleppe
Images naturally appear alongside text in a wide variety of media, such as books, magazines, newspapers, and in online articles.
no code implementations • 15 Jan 2016 • Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, Barbara Plank
Automatic description generation from natural images is a challenging problem that has recently received a large amount of interest from the computer vision and natural language processing communities.
1 code implementation • 15 Oct 2015 • Desmond Elliott, Stella Frank, Eva Hasler
In this paper we present an approach to multi-language image description bringing together insights from neural machine translation and neural image description.