no code implementations • EAMT 2022 • Anabela Barreiro, José GC de Souza, Albert Gatt, Mehul Bhatt, Elena Lloret, Aykut Erdem, Dimitra Gkatzia, Helena Moniz, Irene Russo, Fabio Kepler, Iacer Calixto, Marcin Paprzycki, François Portet, Isabelle Augenstein, Mirela Alhasani
This paper presents the Multitask, Multilingual, Multimodal Language Generation COST Action – Multi3Generation (CA18231), an interdisciplinary network of research groups working on different aspects of language generation.
no code implementations • 3 Jul 2023 • Giovanni Cinà, Daniel Fernandez-Llaneza, Nishant Mishra, Tabea E. Röber, Sandro Pezzelle, Iacer Calixto, Rob Goedhart, Ş. İlker Birbil
Feature attribution methods have become a staple method to disentangle the complex behavior of black box models.
1 code implementation • 28 Mar 2023 • Auke Elfrink, Iacopo Vagliano, Ameen Abu-Hanna, Iacer Calixto
We investigate different natural language processing (NLP) approaches based on contextualised word representations for the problem of early prediction of lung cancer using free-text patient medical notes of Dutch primary care physicians.
1 code implementation • 8 Nov 2022 • İlker Kesen, Aykut Erdem, Erkut Erdem, Iacer Calixto
In the second stage, we integrate visual supervision into our system using visual imageries, two sets of images generated by a text-to-image model by taking terms and descriptions as input.
1 code implementation • 27 Jun 2022 • Ningyuan Huang, Yash R. Deshpande, Yibo Liu, Houda Alberts, Kyunghyun Cho, Clara Vania, Iacer Calixto
We use the recently released VisualSem KG as our external knowledge repository, which covers a subset of Wikipedia and WordNet entities, and compare a mix of tuple-based and graph-based algorithms to learn entity and relation representations that are grounded on the KG multimodal information.
Multilingual Named Entity Recognition
named-entity-recognition
+2
1 code implementation • ACL 2022 • Letitia Parcalabescu, Michele Cafagna, Lilitta Muradjan, Anette Frank, Iacer Calixto, Albert Gatt
We propose VALSE (Vision And Language Structured Evaluation), a novel benchmark designed for testing general-purpose pretrained vision and language (V&L) models for their visio-linguistic grounding capabilities on specific linguistic phenomena.
Ranked #1 on
image-sentence alignment
on VALSE
1 code implementation • 6 Dec 2021 • Ruan Chaves Rodrigues, Marcelo Akira Inuzuka, Juliana Resplande Sant'Anna Gomes, Acquila Santos Rocha, Iacer Calixto, Hugo Alexandre Dantas do Nascimento
Hashtag segmentation, also known as hashtag decomposition, is a common step in preprocessing pipelines for social media datasets.
no code implementations • NAACL 2021 • Iacer Calixto, Alessandro Raganato, Tommaso Pasini
Further adding extra languages lead to improvements in most tasks up to a certain point, but overall we found it non-trivial to scale improvements in model transferability by training on ever increasing amounts of Wikipedia languages.
no code implementations • ACL (mmsr, IWCS) 2021 • Letitia Parcalabescu, Albert Gatt, Anette Frank, Iacer Calixto
We investigate the reasoning ability of pretrained vision and language (V&L) models in two tasks that require multimodal integration: (1) discriminating a correct image-sentence pair from an incorrect one, and (2) counting entities in an image.
no code implementations • 3 Dec 2020 • Elham J. Barezi, Iacer Calixto, Kyunghyun Cho, Pascale Fung
These tasks are hard because the label space is usually (i) very large, e. g. thousands or millions of labels, (ii) very sparse, i. e. very few labels apply to each input document, and (iii) highly correlated, meaning that the existence of one label changes the likelihood of predicting all other labels.
1 code implementation • Asian Chapter of the Association for Computational Linguistics 2020 • Victor Milewski, Marie-Francine Moens, Iacer Calixto
Overall, we find no significant difference between models that use scene graph features and models that only use object detection features across different captioning metrics, which suggests that existing scene graph generation models are still too noisy to be useful in image captioning.
1 code implementation • 20 Aug 2020 • Houda Alberts, Iacer Calixto
In this paper, we describe and publicly release an image dataset along with pretrained models designed to (semi-)automatically filter out undesirable images from very large image collections, possibly obtained from the web.
1 code implementation • EMNLP (MRL) 2021 • Houda Alberts, Teresa Huang, Yash Deshpande, Yibo Liu, Kyunghyun Cho, Clara Vania, Iacer Calixto
We also release a neural multi-modal retrieval model that can use images or sentences as inputs and retrieves entities in the KG.
no code implementations • WS 2020 • Diksha Meghwal, Katharina Kann, Iacer Calixto, Stanislaw Jastrzebski
Pretrained language models have obtained impressive results for a large set of natural language understanding tasks.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman
Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings.
Ranked #20 on
Zero-Shot Cross-Lingual Transfer
on XTREME
1 code implementation • ACL 2019 • Iacer Calixto, Miguel Rios, Wilker Aziz
In this work, we propose to model the interaction between visual and textual features for multi-modal neural machine translation (MMT) through a latent variable model.
Ranked #9 on
Multimodal Machine Translation
on Multi30K
no code implementations • RANLP 2017 • Iacer Calixto, Qun Liu
We propose a novel discriminative ranking model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality.
no code implementations • EMNLP 2017 • Iacer Calixto, Qun Liu
We introduce multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder.
no code implementations • WS 2017 • Thiago Castro Ferreira, Iacer Calixto, S Wubben, er, Emiel Krahmer
In this paper, we study AMR-to-text generation, framing it as a translation task and comparing two different MT approaches (Phrase-based and Neural MT).
no code implementations • WS 2017 • Iacer Calixto, Daniel Stein, Evgeny Matusov, Sheila Castilho, Andy Way
Nonetheless, human evaluators ranked translations from a multi-modal NMT model as better than those of a text-only NMT over 88{\%} of the time, which suggests that images do help NMT in this use-case.
no code implementations • EACL 2017 • Iacer Calixto, Daniel Stein, Evgeny Matusov, Pintu Lohar, Sheila Castilho, Andy Way
We evaluate our models quantitatively using BLEU and TER and find that (i) additional synthetic data has a general positive impact on text-only and multi-modal NMT models, and that (ii) using a multi-modal NMT model for re-ranking n-best lists improves TER significantly across different n-best list sizes.
no code implementations • ACL 2017 • Iacer Calixto, Qun Liu, Nick Campbell
We introduce a Multi-modal Neural Machine Translation model in which a doubly-attentive decoder naturally incorporates spatial visual features obtained using pre-trained convolutional neural networks, bridging the gap between image description and translation.
Ranked #11 on
Multimodal Machine Translation
on Multi30K
no code implementations • 3 Feb 2017 • Iacer Calixto, Qun Liu, Nick Campbell
We propose a novel discriminative model that learns embeddings from multilingual and multi-modal data, meaning that our model can take advantage of images and descriptions in multiple languages to improve embedding quality.
no code implementations • 23 Jan 2017 • Iacer Calixto, Qun Liu, Nick Campbell
We introduce multi-modal, attention-based neural machine translation (NMT) models which incorporate visual features into different parts of both the encoder and the decoder.
Ranked #10 on
Multimodal Machine Translation
on Multi30K
no code implementations • LREC 2016 • Debasis Ganguly, Iacer Calixto, Gareth Jones
Motivated by the adage that a {``}picture is worth a thousand words{''} it can be reasoned that automatically enriching the textual content of a document with relevant images can increase the readability of a document.