1 code implementation • EMNLP (ACL) 2021 • Fernando Alva-Manchego, Abiola Obamuyide, Amit Gajbhiye, Frédéric Blain, Marina Fomicheva, Lucia Specia
We introduce deepQuest-py, a framework for training and evaluation of large and light-weight models for Quality Estimation (QE).
no code implementations • MTSummit 2021 • Fernando Alva-Manchego, Lucia Specia, Sara Szoc, Tom Vanallemeersch, Heidi Depraetere
In this scenario, a Quality Estimation (QE) tool can be used to score MT outputs, and a threshold on the QE scores can be applied to decide whether an MT output can be used as-is or requires human post-edition.
no code implementations • EMNLP (IWSLT) 2019 • Jan Niehues, Rolando Cattoni, Sebastian Stüker, Matteo Negri, Marco Turchi, Thanh-Le Ha, Elizabeth Salesky, Ramon Sanabria, Loic Barrault, Lucia Specia, Marcello Federico
The IWSLT 2019 evaluation campaign featured three tasks: speech translation of (i) TED talks and (ii) How2 instructional videos from English into German and Portuguese, and (iii) text translation of TED talks from English into Czech.
1 code implementation • CL (ACL) 2021 • Fernando Alva-Manchego, Carolina Scarton, Lucia Specia
Second, we conduct the first meta-evaluation of automatic metrics in Text Simplification, using our new data set (and other existing data) to analyze the variation of the correlation between metrics’ scores and human judgments across three dimensions: the perceived simplicity level, the system type, and the set of references used for computation.
no code implementations • NAACL (CLPsych) 2022 • Falwah Alhamed, Julia Ive, Lucia Specia
The second is predicting the degree of suicide risk as a user-level classification task.
no code implementations • WMT (EMNLP) 2021 • Genze Jiang, Zhenhao Li, Lucia Specia
This paper presents Imperial College London’s submissions to the WMT21 Quality Estimation (QE) Shared Task 3: Critical Error Detection.
no code implementations • WMT (EMNLP) 2021 • Lucia Specia, Frédéric Blain, Marina Fomicheva, Chrysoula Zerva, Zhenhao Li, Vishrav Chaudhary, André F. T. Martins
We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels.
no code implementations • EAMT 2020 • Frederic Blain, Nikolaos Aletras, Lucia Specia
However, QE models are often trained on noisy approximations of quality annotations derived from the proportion of post-edited words in translated sentences instead of direct human annotations of translation errors.
no code implementations • ACL (RepL4NLP) 2021 • Abiola Obamuyide, Marina Fomicheva, Lucia Specia
To address these challenges, we propose a Bayesian meta-learning approach for adapting QE models to the needs and preferences of each user with limited supervision.
no code implementations • LREC 2022 • Nishtha Jain, Declan Groves, Lucia Specia, Maja Popović
This work explores a light-weight method to generate gender variants for a given text using pre-trained language models as the resource, without any task-specific labelled data.
no code implementations • LREC 2022 • Júlia Sato, Helena Caseli, Lucia Specia
The good BLEU and METEOR values obtained for this new language pair, regarding the original English-German VTLM, establish the suitability of the model to other languages.
no code implementations • EMNLP 2021 • Yishu Miao, Phil Blunsom, Lucia Specia
We propose a generative framework for simultaneous machine translation.
no code implementations • EAMT 2022 • Khetam Al Sharou, Lucia Specia
We also study the impact of the source text on generating critical errors in the translation and, based on this, propose a set of recommendations on aspects of the MT that need further scrutiny, especially for user-generated content, to avoid generating such errors, and hence improve online communication.
1 code implementation • ACL 2022 • Hanna Behnke, Marina Fomicheva, Lucia Specia
Machine Translation Quality Estimation (QE) aims to build predictive models to assess the quality of machine-generated translations in the absence of reference translations.
no code implementations • RANLP 2021 • Khetam Al Sharou, Zhenhao Li, Lucia Specia
In this paper, we propose a definition and taxonomy of various types of non-standard textual content – generally referred to as “noise” – in Natural Language Processing (NLP).
no code implementations • MMTLRL (RANLP) 2021 • Lucia Specia
Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible.
no code implementations • WMT (EMNLP) 2020 • Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Vishrav Chaudhary, Mark Fishel, Francisco Guzmán, Lucia Specia
We explore (a) a black-box approach to QE based on pre-trained representations; and (b) glass-box approaches that leverage various indicators that can be extracted from the neural MT systems.
no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Frédéric Blain, Marina Fomicheva, Erick Fonseca, Vishrav Chaudhary, Francisco Guzmán, André F. T. Martins
We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels.
no code implementations • WMT (EMNLP) 2020 • Lucia Specia, Zhenhao Li, Juan Pino, Vishrav Chaudhary, Francisco Guzmán, Graham Neubig, Nadir Durrani, Yonatan Belinkov, Philipp Koehn, Hassan Sajjad, Paul Michel, Xian Li
We report the findings of the second edition of the shared task on improving robustness in Machine Translation (MT).
no code implementations • 28 Jun 2024 • Zhenhao Li, Marek Rei, Lucia Specia
Pretrained language models have significantly advanced performance across various natural language processing tasks.
no code implementations • 23 Jan 2024 • Haoyan Luo, Lucia Specia
This survey underscores the imperative for increased explainability in LLMs, delving into both the research on explainability and the various methodologies and tasks that utilize an understanding of these models.
no code implementations • 17 Nov 2022 • Joël Tang, Marina Fomicheva, Lucia Specia
We present a case study focusing on model understanding and regularisation to reduce hallucinations in NMT.
no code implementations • 19 Oct 2022 • Joshua Cesare Placidi, Yishu Miao, Zixu Wang, Lucia Specia
Scene Text Recognition (STR) models have achieved high performance in recent years on benchmark datasets where text images are presented with minimal noise.
no code implementations • 10 Oct 2022 • Zixu Wang, Yujie Zhong, Yishu Miao, Lin Ma, Lucia Specia
However, even in paired video-text segments, only a subset of the frames are semantically relevant to the corresponding text, with the remainder representing noise; where the ratio of noisy frames is higher for longer videos.
1 code implementation • 24 Jun 2022 • Atijit Anuchitanukul, Lucia Specia
We present Burst2Vec, our multi-task learning approach to predict emotion, age, and origin (i. e., native country/language) from vocal bursts.
1 code implementation • 29 Apr 2022 • Alexander Gaskell, Yishu Miao, Lucia Specia, Francesca Toni
We propose a novel, generative adversarial framework for probing and improving these models' reasoning capabilities.
no code implementations • 23 Jan 2022 • Veneta Haralampieva, Ozan Caglayan, Lucia Specia
A particular use for such multimodal systems is the task of simultaneous machine translation, where visual context has been shown to complement the partial information provided by the source sentence, especially in the early phases of translation.
no code implementations • 24 Nov 2021 • Atijit Anuchitanukul, Julia Ive, Lucia Specia
We then propose to bring these findings into computational detection models by introducing and evaluating (a) neural architectures for contextual toxicity detection that are aware of the conversational structure, and (b) data augmentation strategies that can help model contextual toxicity detection.
Ranked #1 on Toxic Comment Classification on CAD
no code implementations • NAACL 2022 • Nihir Vedd, Zixu Wang, Marek Rei, Yishu Miao, Lucia Specia
In traditional Visual Question Generation (VQG), most images have multiple concepts (e. g. objects and categories) for which a question could be generated, but models are trained to mimic an arbitrary choice of concept as given in their training data.
1 code implementation • WMT (EMNLP) 2021 • Diptesh Kanojia, Marina Fomicheva, Tharindu Ranasinghe, Frédéric Blain, Constantin Orăsan, Lucia Specia
However, this ability is yet to be tested in the current evaluation practices, where QE systems are assessed only in terms of their correlation with human judgements.
no code implementations • EMNLP 2021 • Shuo Sun, Ahmed El-Kishky, Vishrav Chaudhary, James Cross, Francisco Guzmán, Lucia Specia
Sentence-level Quality estimation (QE) of machine translation is traditionally formulated as a regression task, and the performance of QE models is typically measured by Pearson correlation with human labels.
no code implementations • EMNLP (CINLP) 2021 • Antigoni-Maria Founta, Lucia Specia
The societal issue of digital hostility has previously attracted a lot of attention.
no code implementations • Findings (ACL) 2022 • Marina Fomicheva, Lucia Specia, Nikolaos Aletras
Recent Quality Estimation (QE) models based on multilingual pre-trained representations have achieved very competitive results when predicting the overall quality of translated sentences.
no code implementations • ACL 2021 • Abiola Obamuyide, Marina Fomicheva, Lucia Specia
Most current quality estimation (QE) models for machine translation are trained and evaluated in a static setting where training and test data are assumed to be from a fixed distribution.
1 code implementation • Findings (ACL) 2021 • Amit Gajbhiye, Marina Fomicheva, Fernando Alva-Manchego, Frédéric Blain, Abiola Obamuyide, Nikolaos Aletras, Lucia Specia
Quality Estimation (QE) is the task of automatically predicting Machine Translation quality in the absence of reference translations, making it applicable in real-time settings, such as translating online social media conversations.
1 code implementation • ACL 2021 • Faidon Mitzalis, Ozan Caglayan, Pranava Madhyastha, Lucia Specia
We present BERTGEN, a novel generative, decoder-only model which extends BERT by fusing multimodal and multilingual pretrained models VL-BERT and M-BERT, respectively.
no code implementations • NAACL 2021 • Yurun Song, Junchen Zhao, Lucia Specia
Machine translation (MT) is currently evaluated in one of two ways: in a monolingual fashion, by comparison with the system output to one or more human reference translations, or in a trained crosslingual fashion, by building a supervised model to predict quality scores from human-labeled data.
no code implementations • 11 May 2021 • Zixu Wang, Yishu Miao, Lucia Specia
Experiments on Visual Question Answering as downstream task demonstrate the effectiveness of the proposed generative model, which is able to improve strong UpDn-based models to achieve state-of-the-art performance.
no code implementations • EMNLP (CINLP) 2021 • Panagiotis Fytas, Georgios Rizos, Lucia Specia
Despite peer-reviewing being an essential component of academia since the 1600s, it has repeatedly received criticisms for lack of transparency and consistency.
1 code implementation • NAACL 2021 • Vilém Zouhar, Michal Novák, Matúš Žilinec, Ondřej Bojar, Mateo Obregón, Robin L. Hill, Frédéric Blain, Marina Fomicheva, Lucia Specia, Lisa Yankovskaya
Translating text into a language unknown to the text's author, dubbed outbound translation, is a modern need for which the user experience has significant room for improvement, beyond the basic machine translation facility.
1 code implementation • Findings (EMNLP) 2021 • Zhenhao Li, Marek Rei, Lucia Specia
Neural Machine Translation models are sensitive to noise in the input texts, such as misspelled words and ungrammatical constructions.
1 code implementation • LREC 2022 • Josiah Wang, Pranava Madhyastha, Josiel Figueiredo, Chiraag Lala, Lucia Specia
The dataset will benefit research on visual grounding of words especially in the context of free-form sentences, and can be obtained from https://doi. org/10. 5281/zenodo. 5034604 under a Creative Commons licence.
Ranked #1 on Multimodal Text Prediction on MultiSubs
Multimodal Lexical Translation Multimodal Text Prediction +2
1 code implementation • EACL 2021 • Julia Ive, Andy Mingren Li, Yishu Miao, Ozan Caglayan, Pranava Madhyastha, Lucia Specia
This paper addresses the problem of simultaneous machine translation (SiMT) by exploring two main concepts: (a) adaptive policies to learn a good trade-off between high translation quality and low latency; and (b) visual information to support this process by providing additional (visual) contextual information which may be available before the textual input is produced.
1 code implementation • EACL 2021 • Julia Ive, Zixu Wang, Marina Fomicheva, Lucia Specia
Reinforcement Learning (RL) is a powerful framework to address the discrepancy between loss functions used during training and the final evaluation metrics to be used at test time.
no code implementations • EACL 2021 • Yi-Lin Tuan, Ahmed El-Kishky, Adithya Renduchintala, Vishrav Chaudhary, Francisco Guzmán, Lucia Specia
Quality estimation aims to measure the quality of translated content without access to a reference translation.
1 code implementation • EACL 2021 • Ozan Caglayan, Menekse Kuyu, Mustafa Sercan Amac, Pranava Madhyastha, Erkut Erdem, Aykut Erdem, Lucia Specia
Pre-trained language models have been shown to improve performance in many natural language tasks substantially.
no code implementations • 16 Jan 2021 • Zixu Wang, Yishu Miao, Lucia Specia
Current work on Visual Question Answering (VQA) explore deterministic approaches conditioned on various types of image and question features.
no code implementations • 13 Dec 2020 • Begum Citamak, Ozan Caglayan, Menekse Kuyu, Erkut Erdem, Aykut Erdem, Pranava Madhyastha, Lucia Specia
We hope that the MSVD-Turkish dataset and the results reported in this work will lead to better video captioning and multimodal machine translation models for Turkish and other morphology rich and agglutinative languages.
no code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Shuo Sun, Marina Fomicheva, Fr{\'e}d{\'e}ric Blain, Vishrav Chaudhary, Ahmed El-Kishky, Adithya Renduchintala, Francisco Guzm{\'a}n, Lucia Specia
Predicting the quality of machine translation has traditionally been addressed with language-specific models, under the assumption that the quality label distribution or linguistic features exhibit traits that are not shared across languages.
1 code implementation • 19 Nov 2020 • Yujie Zhong, Linhai Xie, Sen Wang, Lucia Specia, Yishu Miao
In this paper, we teach machines to understand visuals and natural language by learning the mapping between sentences and noisy video snippets without explicit annotations.
no code implementations • COLING 2020 • Ozan Caglayan, Pranava Madhyastha, Lucia Specia
Automatic evaluation of language generation systems is a well-studied problem in Natural Language Processing.
1 code implementation • EMNLP 2020 • Piyawat Lertvittayakumjorn, Lucia Specia, Francesca Toni
Since obtaining a perfect training dataset (i. e., a dataset which is considerably large, unbiased, and well-representative of unseen cases) is hardly possible, many real-world text classifiers are trained on the available, yet imperfect, datasets.
1 code implementation • LREC 2022 • Marina Fomicheva, Shuo Sun, Erick Fonseca, Chrysoula Zerva, Frédéric Blain, Vishrav Chaudhary, Francisco Guzmán, Nina Lopatina, Lucia Specia, André F. T. Martins
We present MLQE-PE, a new dataset for Machine Translation (MT) Quality Estimation (QE) and Automatic Post-Editing (APE).
1 code implementation • EMNLP 2020 • Ozan Caglayan, Julia Ive, Veneta Haralampieva, Pranava Madhyastha, Loïc Barrault, Lucia Specia
Simultaneous machine translation (SiMT) aims to translate a continuous input text stream into another language with the lowest latency and highest quality possible.
1 code implementation • WS 2020 • Zhenhao Li, Marina Fomicheva, Lucia Specia
This paper describes our submission to the 2020 Duolingo Shared Task on Simultaneous Translation And Paraphrase for Language Education (STAPLE).
no code implementations • ACL 2020 • Shuo Sun, Francisco Guzm{\'a}n, Lucia Specia
Recent advances in pre-trained multilingual language models lead to state-of-the-art results on the task of quality estimation (QE) for machine translation.
no code implementations • ACL 2020 • Shu Okabe, Fr{\'e}d{\'e}ric Blain, Lucia Specia
We propose approaches to Quality Estimation (QE) for Machine Translation that explore both text and visual modalities for Multimodal QE.
no code implementations • ACL 2020 • Marina Fomicheva, Lucia Specia, Francisco Guzm{\'a}n
Reliably evaluating Machine Translation (MT) through automated metrics is a long-standing problem.
3 code implementations • 21 May 2020 • Marina Fomicheva, Shuo Sun, Lisa Yankovskaya, Frédéric Blain, Francisco Guzmán, Mark Fishel, Nikolaos Aletras, Vishrav Chaudhary, Lucia Specia
Quality Estimation (QE) is an important component in making Machine Translation (MT) useful in real-world applications, as it is aimed to inform the user on the quality of the MT output at test time.
1 code implementation • ACL 2020 • Fernando Alva-Manchego, Louis Martin, Antoine Bordes, Carolina Scarton, Benoît Sagot, Lucia Specia
Furthermore, we motivate the need for developing better methods for automatic evaluation using ASSET, since we show that current popular metrics may not be suitable when multiple simplification transformations are performed.
no code implementations • LREC 2020 • Julia Ive, Lucia Specia, Sara Szoc, Tom Vanallemeersch, Joachim Van den Bogaert, Eduardo Farah, Christine Maroti, Artur Ventura, Maxim Khalilov
We introduce a machine translation dataset for three pairs of languages in the legal domain with post-edited high-quality neural machine translation and independent human references.
no code implementations • CL 2020 • Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia
Sentence Simplification (SS) aims to modify a sentence in order to make it easier to read and understand.
no code implementations • 28 Nov 2019 • Umut Sulubacak, Ozan Caglayan, Stig-Arne Grönroos, Aku Rouhe, Desmond Elliott, Lucia Specia, Jörg Tiedemann
Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data.
Ranked #4 on Multimodal Machine Translation on Multi30K
1 code implementation • WS 2019 • Zhenhao Li, Lucia Specia
Neural Machine Translation (NMT) models have been proved strong when translating clean texts, but they are very sensitive to noise in the input.
1 code implementation • IJCNLP 2019 • Julia Ive, Pranava Madhyastha, Lucia Specia
Most text-to-text generation tasks, for example text summarisation and text simplification, require copying words from the input to the output.
no code implementations • EMNLP (IWSLT) 2019 • Zixiu Wu, Ozan Caglayan, Julia Ive, Josiah Wang, Lucia Specia
Upon conducting extensive experiments, we found that (i) the explored visual integration schemes often harm the translation performance for the transformer and additive deliberation, but considerably improve the cascade deliberation; (ii) the transformer and cascade deliberation integrate the visual modality better than the additive deliberation, as shown by the incongruence analysis.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Oct 2019 • Ozan Caglayan, Zixiu Wu, Pranava Madhyastha, Josiah Wang, Lucia Specia
This paper describes the Imperial College London team's submission to the 2019' VATEX video captioning challenge, where we first explore two sequence-to-sequence models, namely a recurrent (GRU) model and a transformer model, which generate captions from the I3D action features.
1 code implementation • EMNLP (IWSLT) 2019 • Carolina Scarton, Mikel L. Forcada, Miquel Esplà-Gomis, Lucia Specia
To that end, we report experiments on a dataset with newly-collected post-editing indicators and show their usefulness when estimating post-editing effort.
no code implementations • 7 Oct 2019 • Zhenhao Li, Lucia Specia
Neural Machine Translation (NMT) models have been proved strong when translating clean texts, but they are very sensitive to noise in the input.
no code implementations • CL 2019 • Marina Fomicheva, Lucia Specia
Much work has been dedicated to the improvement of evaluation metrics to achieve a higher correlation with human judgments.
1 code implementation • ICCV 2019 • Josiah Wang, Lucia Specia
Localizing phrases in images is an important part of image understanding and can be useful in many applications that require mappings between textual and visual information.
1 code implementation • IJCNLP 2019 • Fernando Alva-Manchego, Louis Martin, Carolina Scarton, Lucia Specia
We introduce EASSE, a Python package aiming to facilitate and standardise automatic evaluation and comparison of Sentence Simplification (SS) systems.
no code implementations • 5 Aug 2019 • Zixiu Wu, Julia Ive, Josiah Wang, Pranava Madhyastha, Lucia Specia
The question we ask ourselves is whether visual features can support the translation process, in particular, given that this is a dataset extracted from videos, we focus on the translation of actions, which we believe are poorly captured in current static image-text datasets currently used for multimodal translation.
no code implementations • WS 2019 • Zhenhao Li, Lucia Specia
This paper describes our submission to the WMT 2019 Chinese-English (zh-en) news translation shared task.
no code implementations • WS 2019 • Julian Chow, Lucia Specia, Pranava Madhyastha
We propose WMDO, a metric based on distance between distributions in the semantic vector space.
no code implementations • WS 2019 • Fern Alva-Manchego, o, Carolina Scarton, Lucia Specia
Current approaches to Text Simplification focus on simplifying sentences individually.
no code implementations • ACL 2019 • Pranava Madhyastha, Josiah Wang, Lucia Specia
It estimates the faithfulness of a generated caption with respect to the content of the actual image, based on the semantic similarity between labels of objects depicted in images and words in the description.
no code implementations • WS 2019 • Zixu Wang, Julia Ive, Sumithra Velupillai, Lucia Specia
A major obstacle to the development of Natural Language Processing (NLP) methods in the biomedical domain is data accessibility.
1 code implementation • ACL 2019 • Julia Ive, Pranava Madhyastha, Lucia Specia
Previous work on multimodal machine translation has shown that visual information is only needed in very specific cases, for example in the presence of ambiguous words where the textual context is not sufficient.
Ranked #3 on Multimodal Machine Translation on Multi30K (Meteor (EN-FR) metric)
no code implementations • WS 2019 • Chiraag Lala, Pranava Madhyastha, Lucia Specia
Recent work on visually grounded language learning has focused on broader applications of grounded representations, such as visual question answering and multimodal machine translation.
Grounded language learning Multimodal Machine Translation +3
no code implementations • NAACL 2019 • Ozan Caglayan, Pranava Madhyastha, Lucia Specia, Loïc Barrault
Current work on multimodal machine translation (MMT) has suggested that the visual modality is either unnecessary or only marginally beneficial.
1 code implementation • WS 2018 • Pranava Swaroop Madhyastha, Josiah Wang, Lucia Specia
We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn {`}distributional similarity{'} in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space.
2 code implementations • 1 Nov 2018 • Ramon Sanabria, Ozan Caglayan, Shruti Palaskar, Desmond Elliott, Loïc Barrault, Lucia Specia, Florian Metze
In this paper, we introduce How2, a multimodal collection of instructional videos with English subtitles and crowdsourced Portuguese translations.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
1 code implementation • 7 Oct 2018 • Karin Sim Smith, Lucia Specia
In an attempt to improve overall translation quality, there has been an increasing focus on integrating more linguistic elements into Machine Translation (MT).
no code implementations • EMNLP 2018 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Mark Fishel, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Matt Post, Lucia Specia, Marco Turchi, Karin Verspoor
1 code implementation • WS 2018 • Lo{\"\i}c Barrault, Fethi Bougares, Lucia Specia, Chiraag Lala, Desmond Elliott, Stella Frank
In this task a source sentence in English is supplemented by an image and participating systems are required to generate a translation for such a sentence into German, French or Czech.
no code implementations • WS 2018 • Julia Ive, Carolina Scarton, Fr{\'e}d{\'e}ric Blain, Lucia Specia
In this paper we present the University of Sheffield submissions for the WMT18 Quality Estimation shared task.
no code implementations • WS 2018 • Chiraag Lala, Pranava Swaroop Madhyastha, Carolina Scarton, Lucia Specia
For task 1b, we explore three approaches: (i) re-ranking based on cross-lingual word sense disambiguation (as for task 1), (ii) re-ranking based on consensus of NMT n-best lists from German-Czech, French-Czech and English-Czech systems, and (iii) data augmentation by generating English source data through machine translation from French to English and from German to English followed by hypothesis selection using a multimodal-reranker.
no code implementations • WS 2018 • Lucia Specia, Fr{\'e}d{\'e}ric Blain, Varvara Logacheva, Ram{\'o}n Astudillo, Andr{\'e} F. T. Martins
We report the results of the WMT18 shared task on Quality Estimation, i. e. the task of predicting the quality of the output of machine translation systems at various granularity levels: word, phrase, sentence and document.
no code implementations • 11 Sep 2018 • Pranava Madhyastha, Josiah Wang, Lucia Specia
We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn `distributional similarity' in a multimodal feature space by mapping a test image to similar training images in this space and generating a caption from the same space.
no code implementations • WS 2018 • Mikel L. Forcada, Carolina Scarton, Lucia Specia, Barry Haddow, Alexandra Birch
A popular application of machine translation (MT) is gisting: MT is consumed as is to make sense of text in a foreign language.
2 code implementations • COLING 2018 • Julia Ive, Fr{\'e}d{\'e}ric Blain, Lucia Specia
Our approach is significantly faster and yields performance improvements for a range of document-level quality estimation tasks.
no code implementations • ACL 2018 • Carolina Scarton, Lucia Specia
Text simplification (TS) is a monolingual text-to-text transformation task where an original (complex) text is transformed into a target (simpler) text.
no code implementations • NAACL 2018 • David Steele, Lucia Specia
Machine Translation systems are usually evaluated and compared using automated evaluation metrics such as BLEU and METEOR to score the generated translations against human translations.
1 code implementation • NAACL 2018 • Pranava Madhyastha, Josiah Wang, Lucia Specia
We address the task of detecting foiled image captions, i. e. identifying whether a caption contains a word that has been deliberately replaced by a semantically similar word, thus rendering it inaccurate with respect to the image being described.
no code implementations • WS 2018 • Seid Muhie Yimam, Chris Biemann, Shervin Malmasi, Gustavo H. Paetzold, Lucia Specia, Sanja Štajner, Anaïs Tack, Marcos Zampieri
We report the findings of the second Complex Word Identification (CWI) shared task organized as part of the BEA workshop co-located with NAACL-HLT'2018.
no code implementations • NAACL 2018 • Josiah Wang, Pranava Madhyastha, Lucia Specia
The use of explicit object detectors as an intermediate step to image captioning - which used to constitute an essential stage in early work - is often bypassed in the currently dominant end-to-end approaches, where the language model is conditioned directly on a mid-level image embedding.
1 code implementation • ICLR 2018 • Pranava Madhyastha, Josiah Wang, Lucia Specia
We hypothesize that end-to-end neural image captioning systems work seemingly well because they exploit and learn ‘distributional similarity’ in a multimodal feature space, by mapping a test image to similar training images in this space and generating a caption from the same space.
no code implementations • IJCNLP 2017 • Carolina Scarton, Alessio Palmero Aprosio, Sara Tonelli, Tamara Mart{\'\i}n Wanton, Lucia Specia
Our implementation includes a set of general-purpose simplification rules, as well as a sentence selection module (to select sentences to be simplified) and a confidence model (to select only promising simplifications).
no code implementations • IJCNLP 2017 • Gustavo Paetzold, Fern Alva-Manchego, o, Lucia Specia
We introduce MASSAlign: a Python library for the alignment and annotation of monolingual comparable documents.
1 code implementation • IJCNLP 2017 • Fern Alva-Manchego, o, Joachim Bingel, Gustavo Paetzold, Carolina Scarton, Lucia Specia
Current research in text simplification has been hampered by two central problems: (i) the small amount of high-quality parallel simplification data available, and (ii) the lack of explicit annotations of simplification operations, such as deletions or substitutions, on existing data.
Ranked #8 on Text Simplification on PWKP / WikiSmall (SARI metric)
no code implementations • IJCNLP 2017 • Gustavo Paetzold, Lucia Specia
There is no question that our research community have, and still has been producing an insurmountable amount of interesting strategies, models and tools to a wide array of problems and challenges in diverse areas of knowledge.
no code implementations • WS 2017 • Desmond Elliott, Stella Frank, Loïc Barrault, Fethi Bougares, Lucia Specia
The multilingual image description task was changed such that at test time, only the image is given.
no code implementations • WS 2017 • Marcos Zampieri, Shervin Malmasi, Gustavo Paetzold, Lucia Specia
This paper revisits the problem of complex word identification (CWI) following up the SemEval CWI shared task.
no code implementations • WS 2017 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Shu-Jian Huang, Matthias Huck, Philipp Koehn, Qun Liu, Varvara Logacheva, Christof Monz, Matteo Negri, Matt Post, Raphael Rubino, Lucia Specia, Marco Turchi
no code implementations • WS 2017 • Jan-Thorsten Peter, Hermann Ney, Ond{\v{r}}ej Bojar, Ngoc-Quan Pham, Jan Niehues, Alex Waibel, Franck Burlot, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Valters {\v{S}}ics, Jasmijn Bastings, Miguel Rios, Wilker Aziz, Philip Williams, Fr{\'e}d{\'e}ric Blain, Lucia Specia
no code implementations • SEMEVAL 2017 • Daniel Cer, Mona Diab, Eneko Agirre, I{\~n}igo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
3 code implementations • 31 Jul 2017 • Daniel Cer, Mona Diab, Eneko Agirre, Iñigo Lopez-Gazpio, Lucia Specia
Semantic Textual Similarity (STS) measures the meaning similarity of sentences.
no code implementations • EACL 2017 • Gustavo Paetzold, Lucia Specia
We present a new Lexical Simplification approach that exploits Neural Networks to learn substitutions from the Newsela corpus - a large set of professionally produced simplifications.
no code implementations • 13 Dec 2016 • Gustavo Henrique Paetzold, Lucia Specia
Parallel corpora have driven great progress in the field of Text Simplification.
no code implementations • COLING 2016 • Gustavo Paetzold, Lucia Specia
We introduce Anita: a flexible and intelligent Text Adaptation tool for web content that provides Text Simplification and Text Enhancement modules.
no code implementations • COLING 2016 • Carolina Scarton, Gustavo Paetzold, Lucia Specia
The goal of QE is to estimate the quality of language output applications without the need of human references.
no code implementations • COLING 2016 • Gustavo Paetzold, Lucia Specia
We report three user studies in which the Lexical Simplification needs of non-native English speakers are investigated.
no code implementations • COLING 2016 • Gustavo Paetzold, Lucia Specia
Exploring language usage through frequency analysis in large corpora is a defining feature in most recent work in corpus and computational linguistics.
no code implementations • EACL 2017 • Ella Rabinovich, Shachar Mirkin, Raj Nath Patel, Lucia Specia, Shuly Wintner
The language that we produce reflects our personality, and various personal and demographic characteristics can be detected in natural language texts.
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Rajen Chatterjee, Christian Federmann, Yvette Graham, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Philipp Koehn, Varvara Logacheva, Christof Monz, Matteo Negri, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Martin Popel, Matt Post, Raphael Rubino, Carolina Scarton, Lucia Specia, Marco Turchi, Karin Verspoor, Marcos Zampieri
no code implementations • WS 2016 • Jan-Thorsten Peter, Tamer Alkhouli, Hermann Ney, Matthias Huck, Fabienne Braune, Alex Fraser, er, Ale{\v{s}} Tamchyna, Ond{\v{r}}ej Bojar, Barry Haddow, Rico Sennrich, Fr{\'e}d{\'e}ric Blain, Lucia Specia, Jan Niehues, Alex Waibel, Alex Allauzen, re, Lauriane Aufrant, Franck Burlot, Elena Knyazeva, Thomas Lavergne, Fran{\c{c}}ois Yvon, M{\=a}rcis Pinnis, Stella Frank
Ranked #12 on Machine Translation on WMT2016 English-Romanian
no code implementations • WS 2016 • Ond{\v{r}}ej Bojar, Christian Buck, Rajen Chatterjee, Christian Federmann, Liane Guillou, Barry Haddow, Matthias Huck, Antonio Jimeno Yepes, Aur{\'e}lie N{\'e}v{\'e}ol, Mariana Neves, Pavel Pecina, Martin Popel, Philipp Koehn, Christof Monz, Matteo Negri, Matt Post, Lucia Specia, Karin Verspoor, J{\"o}rg Tiedemann, Marco Turchi
no code implementations • CONLL 2016 • Daniel Beck, Lucia Specia, Trevor Cohn
Machine Translation Quality Estimation is a notoriously difficult task, which lessens its usefulness in real-world translation environments.
1 code implementation • WS 2016 • Desmond Elliott, Stella Frank, Khalil Sima'an, Lucia Specia
We introduce the Multi30K dataset to stimulate multilingual multimodal research.
1 code implementation • LREC 2016 • Karin Sim Smith, Wilker Aziz, Lucia Specia
We describe COHERE, our coherence toolkit which incorporates various complementary models for capturing and measuring different aspects of text coherence.
no code implementations • LREC 2016 • Gustavo Paetzold, Lucia Specia
Lexical Simplification is the task of replacing complex words in a text with simpler alternatives.
1 code implementation • LREC 2016 • Varvara Logacheva, Chris Hokamp, Lucia Specia
The tool has a set of state-of-the-art features for QE, and new features can easily be added.
1 code implementation • LREC 2016 • Carolina Scarton, Lucia Specia
Effectively assessing Natural Language Processing output tasks is a challenge for research in the area.
no code implementations • LREC 2016 • Fr{\'e}d{\'e}ric Blain, Varvara Logacheva, Lucia Specia
This paper presents our work towards a novel approach for Quality Estimation (QE) of machine translation based on sequences of adjacent words, the so-called phrases.