Search Results for author: André F. T. Martins

Found 86 papers, 59 papers with code

QUARTZ: Quality-Aware Machine Translation

no code implementations EAMT 2022 José G.C. de Souza, Ricardo Rei, Ana C. Farinha, Helena Moniz, André F. T. Martins

This paper presents QUARTZ, QUality-AwaRe machine Translation, a project led by Unbabel which aims at developing machine translation systems that are more robust and produce fewer critical errors.

Machine Translation Translation

Findings of the WMT 2021 Shared Task on Quality Estimation

no code implementations WMT (EMNLP) 2021 Lucia Specia, Frédéric Blain, Marina Fomicheva, Chrysoula Zerva, Zhenhao Li, Vishrav Chaudhary, André F. T. Martins

We report the results of the WMT 2021 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word and sentence levels.

Machine Translation Sentence +1

Findings of the WMT 2020 Shared Task on Quality Estimation

no code implementations WMT (EMNLP) 2020 Lucia Specia, Frédéric Blain, Marina Fomicheva, Erick Fonseca, Vishrav Chaudhary, Francisco Guzmán, André F. T. Martins

We report the results of the WMT20 shared task on Quality Estimation, where the challenge is to predict the quality of the output of neural machine translation systems at the word, sentence and document levels.

Machine Translation Sentence +1

Project MAIA: Multilingual AI Agent Assistant

no code implementations EAMT 2020 André F. T. Martins, Joao Graca, Paulo Dimas, Helena Moniz, Graham Neubig

This paper presents the Multilingual Artificial Intelligence Agent Assistant (MAIA), a project led by Unbabel with the collaboration of CMU, INESC-ID and IT Lisbon.

BIG-bench Machine Learning Translation

DeepSPIN: Deep Structured Prediction for Natural Language Processing

no code implementations EAMT 2022 André F. T. Martins

DeepSPIN is a research project funded by the European Research Council (ERC) whose goal is to develop new neural structured prediction methods, models, and algorithms for improving the quality, interpretability, and data-efficiency of natural language processing (NLP) systems, with special emphasis on machine translation and quality estimation applications.

Machine Translation Structured Prediction +1

Is Context Helpful for Chat Translation Evaluation?

no code implementations13 Mar 2024 Sweta Agrawal, Amin Farajian, Patrick Fernandes, Ricardo Rei, André F. T. Martins

Our findings show that augmenting neural learned metrics with contextual information helps improve correlation with human judgments in the reference-free scenario and when evaluating translations in out-of-English settings.

Language Modelling Large Language Model +2

Did Translation Models Get More Robust Without Anyone Even Noticing?

no code implementations6 Mar 2024 Ben Peters, André F. T. Martins

Neural machine translation (MT) models achieve strong results across a variety of settings, but it is widely believed that they are highly sensitive to "noisy" inputs, such as spelling errors, abbreviations, and other formatting issues.

Machine Translation Translation

Tower: An Open Multilingual Large Language Model for Translation-Related Tasks

1 code implementation27 Feb 2024 Duarte M. Alves, José Pombal, Nuno M. Guerreiro, Pedro H. Martins, João Alves, Amin Farajian, Ben Peters, Ricardo Rei, Patrick Fernandes, Sweta Agrawal, Pierre Colombo, José G. C. de Souza, André F. T. Martins

While general-purpose large language models (LLMs) demonstrate proficiency on multiple tasks within the domain of translation, approaches based on open LLMs are competitive only when specializing on a single task.

Language Modelling Large Language Model +1

CroissantLLM: A Truly Bilingual French-English Language Model

1 code implementation1 Feb 2024 Manuel Faysse, Patrick Fernandes, Nuno M. Guerreiro, António Loison, Duarte M. Alves, Caio Corro, Nicolas Boizard, João Alves, Ricardo Rei, Pedro H. Martins, Antoni Bigata Casademunt, François Yvon, André F. T. Martins, Gautier Viaud, Céline Hudelot, Pierre Colombo

We introduce CroissantLLM, a 1. 3B language model pretrained on a set of 3T English and French tokens, to bring to the research and industrial community a high-performance, fully open-sourced bilingual model that runs swiftly on consumer-grade local hardware.

Language Modelling Large Language Model

Non-Exchangeable Conformal Language Generation with Nearest Neighbors

1 code implementation1 Feb 2024 Dennis Ulmer, Chrysoula Zerva, André F. T. Martins

Conformal prediction is an attractive framework to provide predictions imbued with statistical guarantees, however, its application to text generation is challenging since any i. i. d.

Conformal Prediction Language Modelling +2

Aligning Neural Machine Translation Models: Human Feedback in Training and Inference

no code implementations15 Nov 2023 Miguel Moura Ramos, Patrick Fernandes, António Farinhas, André F. T. Martins

A core ingredient in RLHF's success in aligning and improving large language models (LLMs) is its reward model, trained using human feedback on model outputs.

Language Modelling Machine Translation +1

An Empirical Study of Translation Hypothesis Ensembling with Large Language Models

1 code implementation17 Oct 2023 António Farinhas, José G. C. de Souza, André F. T. Martins

Large language models (LLMs) are becoming a one-fits-many solution, but they sometimes hallucinate or produce unreliable output.

Machine Translation Translation

xCOMET: Transparent Machine Translation Evaluation through Fine-grained Error Detection

1 code implementation16 Oct 2023 Nuno M. Guerreiro, Ricardo Rei, Daan van Stigt, Luisa Coheur, Pierre Colombo, André F. T. Martins

Widely used learned metrics for machine translation evaluation, such as COMET and BLEURT, estimate the quality of a translation hypothesis by providing a single sentence-level score.

Machine Translation Sentence +1

Non-Exchangeable Conformal Risk Control

1 code implementation2 Oct 2023 António Farinhas, Chrysoula Zerva, Dennis Ulmer, André F. T. Martins

Split conformal prediction has recently sparked great interest due to its ability to provide formally guaranteed uncertainty sets or intervals for predictions made by black-box neural models, ensuring a predefined probability of containing the actual ground truth.

Conformal Prediction Time Series

Scaling up COMETKIWI: Unbabel-IST 2023 Submission for the Quality Estimation Shared Task

1 code implementation21 Sep 2023 Ricardo Rei, Nuno M. Guerreiro, José Pombal, Daan van Stigt, Marcos Treviso, Luisa Coheur, José G. C. de Souza, André F. T. Martins

Our team participated on all tasks: sentence- and word-level quality prediction (task 1) and fine-grained error span detection (task 2).

Sentence Task 2

Conformalizing Machine Translation Evaluation

no code implementations9 Jun 2023 Chrysoula Zerva, André F. T. Martins

Several uncertainty estimation methods have been recently proposed for machine translation evaluation.

Conformal Prediction Machine Translation +1

BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

1 code implementation30 May 2023 Taisiya Glushkova, Chrysoula Zerva, André F. T. Martins

Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers.

Machine Translation Sentence +1

CREST: A Joint Framework for Rationalization and Counterfactual Text Generation

1 code implementation26 May 2023 Marcos Treviso, Alexis Ross, Nuno M. Guerreiro, André F. T. Martins

Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models.

counterfactual Data Augmentation +2

mPLM-Sim: Better Cross-Lingual Similarity and Transfer in Multilingual Pretrained Language Models

1 code implementation23 May 2023 Peiqin Lin, Chengzhi Hu, Zheyu Zhang, André F. T. Martins, Hinrich Schütze

Recent multilingual pretrained language models (mPLMs) have been shown to encode strong language-specific signals, which are not explicitly provided during pretraining.

Open-Ended Question Answering Zero-Shot Cross-Lingual Transfer

The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

1 code implementation19 May 2023 Ricardo Rei, Nuno M. Guerreiro, Marcos Treviso, Luisa Coheur, Alon Lavie, André F. T. Martins

Neural metrics for machine translation evaluation, such as COMET, exhibit significant improvements in their correlation with human judgments, as compared to traditional metrics based on lexical overlap, such as BLEU.

Decision Making Machine Translation +2

Hallucinations in Large Multilingual Translation Models

1 code implementation28 Mar 2023 Nuno M. Guerreiro, Duarte Alves, Jonas Waldendorf, Barry Haddow, Alexandra Birch, Pierre Colombo, André F. T. Martins

Large-scale multilingual machine translation systems have demonstrated remarkable ability to translate directly between numerous languages, making them increasingly appealing for real-world applications.

Language Modelling Large Language Model +2

Discrete Latent Structure in Neural Networks

no code implementations18 Jan 2023 Vlad Niculae, Caio F. Corro, Nikita Nangia, Tsvetomila Mihaylova, André F. T. Martins

Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings.

Python Code Generation by Asking Clarification Questions

1 code implementation19 Dec 2022 Haau-Sing Li, Mohsen Mesgar, André F. T. Martins, Iryna Gurevych

We hypothesize that the under-specification of a natural language description can be resolved by asking clarification questions.

Code Generation Language Modelling

Improving abstractive summarization with energy-based re-ranking

1 code implementation27 Oct 2022 Diogo Pernes, Afonso Mendes, André F. T. Martins

Current abstractive summarization systems present important weaknesses which prevent their deployment in real-world applications, such as the omission of relevant information and the generation of factual inconsistencies (also known as hallucinations).

Abstractive Text Summarization Re-Ranking

Looking for a Needle in a Haystack: A Comprehensive Study of Hallucinations in Neural Machine Translation

2 code implementations10 Aug 2022 Nuno M. Guerreiro, Elena Voita, André F. T. Martins

Although the problem of hallucinations in neural machine translation (NMT) has received some attention, research on this highly pathological phenomenon lacks solid ground.

Machine Translation NMT

Chunk-based Nearest Neighbor Machine Translation

1 code implementation24 May 2022 Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Semi-parametric models, which augment generation with retrieval, have led to impressive results in language modeling and machine translation, due to their ability to retrieve fine-grained information from a datastore of examples.

Domain Adaptation Language Modelling +3

Quality-Aware Decoding for Neural Machine Translation

1 code implementation NAACL 2022 Patrick Fernandes, António Farinhas, Ricardo Rei, José G. C. de Souza, Perez Ogayo, Graham Neubig, André F. T. Martins

Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search.

Machine Translation NMT +1

Efficient Machine Translation Domain Adaptation

1 code implementation SpaNLP (ACL) 2022 Pedro Henrique Martins, Zita Marinho, André F. T. Martins

On the other hand, semi-parametric models have been shown to successfully perform domain adaptation by retrieving examples from an in-domain datastore (Khandelwal et al., 2021).

Domain Adaptation Language Modelling +3

Learning to Scaffold: Optimizing Model Explanations for Teaching

1 code implementation22 Apr 2022 Patrick Fernandes, Marcos Treviso, Danish Pruthi, André F. T. Martins, Graham Neubig

In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model.

Meta-Learning

Disentangling Uncertainty in Machine Translation Evaluation

1 code implementation13 Apr 2022 Chrysoula Zerva, Taisiya Glushkova, Ricardo Rei, André F. T. Martins

Trainable evaluation metrics for machine translation (MT) exhibit strong correlation with human judgements, but they are often hard to interpret and might produce unreliable scores under noisy or out-of-domain data.

Machine Translation Translation +1

Differentiable Causal Discovery Under Latent Interventions

1 code implementation4 Mar 2022 Gonçalo R. A. Faria, André F. T. Martins, Mário A. T. Figueiredo

Recent work has shown promising results in causal discovery by leveraging interventional data with gradient-based methods, even when the intervened variables are unknown.

Causal Discovery Variational Inference

Modeling Structure with Undirected Neural Networks

1 code implementation8 Feb 2022 Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins

In this paper, we combine the representational strengths of factor graphs and of neural networks, proposing undirected neural networks (UNNs): a flexible framework for specifying computations that can be performed in any order.

Dependency Parsing Image Classification

Predicting Attention Sparsity in Transformers

no code implementations spnlp (ACL) 2022 Marcos Treviso, António Góis, Patrick Fernandes, Erick Fonseca, André F. T. Martins

Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax.

Language Modelling Machine Translation +3

When Does Translation Require Context? A Data-driven, Multilingual Exploration

no code implementations15 Sep 2021 Patrick Fernandes, Kayo Yin, Emmy Liu, André F. T. Martins, Graham Neubig

Although proper handling of discourse significantly contributes to the quality of machine translation (MT), these improvements are not adequately measured in common translation quality metrics.

Machine Translation Translation

SPECTRA: Sparse Structured Text Rationalization

2 code implementations EMNLP 2021 Nuno Miguel Guerreiro, André F. T. Martins

Selective rationalization aims to produce decisions along with rationales (e. g., text highlights or word alignments between two sentences).

Natural Language Inference

$\infty$-former: Infinite Memory Transformer

1 code implementation1 Sep 2021 Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Transformers are unable to model long-term memories effectively, since the amount of computation they need to perform grows with the context length.

Dialogue Generation Language Modelling

Sparse Communication via Mixed Distributions

1 code implementation ICLR 2022 António Farinhas, Wilker Aziz, Vlad Niculae, André F. T. Martins

Neural networks and other machine learning models compute continuous representations, while humans communicate mostly through discrete symbols.

Sparse Continuous Distributions and Fenchel-Young Losses

1 code implementation4 Aug 2021 André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.

Audio Classification Question Answering +1

Measuring and Increasing Context Usage in Context-Aware Machine Translation

1 code implementation ACL 2021 Patrick Fernandes, Kayo Yin, Graham Neubig, André F. T. Martins

Recent work in neural machine translation has demonstrated both the necessity and feasibility of using inter-sentential context -- context from sentences other than those currently being translated.

Document Level Machine Translation Machine Translation +1

Reconciling the Discrete-Continuous Divide: Towards a Mathematical Theory of Sparse Communication

no code implementations1 Apr 2021 André F. T. Martins

Neural networks and other machine learning models compute continuous representations, while humans communicate with discrete symbols.

Smoothing and Shrinking the Sparse Seq2Seq Search Space

1 code implementation NAACL 2021 Ben Peters, André F. T. Martins

Current sequence-to-sequence models are trained to minimize cross-entropy and use softmax to compute the locally normalized probabilities over target sequences.

Machine Translation Morphological Inflection +1

Understanding the Mechanics of SPIGOT: Surrogate Gradients for Latent Structure Learning

1 code implementation EMNLP 2020 Tsvetomila Mihaylova, Vlad Niculae, André F. T. Martins

Latent structure models are a powerful tool for modeling language data: they can mitigate the error propagation and annotation bottleneck in pipeline systems, while simultaneously uncovering linguistic insights about the data.

Efficient Marginalization of Discrete and Structured Latent Variables via Sparsity

1 code implementation NeurIPS 2020 Gonçalo M. Correia, Vlad Niculae, Wilker Aziz, André F. T. Martins

In this paper, we propose a new training strategy which replaces these estimators by an exact yet efficient marginalization.

Sparse and Continuous Attention Mechanisms

2 code implementations NeurIPS 2020 André F. T. Martins, António Farinhas, Marcos Treviso, Vlad Niculae, Pedro M. Q. Aguiar, Mário A. T. Figueiredo

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e. g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation).

Machine Translation Question Answering +4

Sparse Text Generation

1 code implementation EMNLP 2020 Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Current state-of-the-art text generators build on powerful language models such as GPT-2, achieving impressive performance.

Dialogue Generation Language Modelling +1

LP-SparseMAP: Differentiable Relaxed Optimization for Sparse Structured Prediction

1 code implementation ICML 2020 Vlad Niculae, André F. T. Martins

Structured prediction requires manipulating a large number of combinatorial structures, e. g., dependency trees or alignments, either as latent or output variables.

Structured Prediction

Adaptively Sparse Transformers

3 code implementations IJCNLP 2019 Gonçalo M. Correia, Vlad Niculae, André F. T. Martins

Findings of the quantitative and qualitative analysis of our approach include that heads in different layers learn different sparsity preferences and tend to be more diverse in their attention distributions than softmax Transformers.

Machine Translation Translation

Notes on Latent Structure Models and SPIGOT

no code implementations24 Jul 2019 André F. T. Martins, Vlad Niculae

These notes aim to shed light on the recently proposed structured projected intermediate gradient optimization technique (SPIGOT, Peng et al., 2018).

Translator2Vec: Understanding and Representing Human Post-Editors

1 code implementation24 Jul 2019 António Góis, André F. T. Martins

The combination of machines and humans for translation is effective, with many studies showing productivity gains when humans post-edit machine-translated output instead of translating from scratch.

Translation

Joint Learning of Named Entity Recognition and Entity Linking

no code implementations ACL 2019 Pedro Henrique Martins, Zita Marinho, André F. T. Martins

Named entity recognition (NER) and entity linking (EL) are two fundamentally related tasks, since in order to perform EL, first the mentions to entities have to be detected.

Entity Linking Multi-Task Learning +3

Scheduled Sampling for Transformers

3 code implementations ACL 2019 Tsvetomila Mihaylova, André F. T. Martins

In the Transformer model, unlike the RNN, the generation of a new word attends to the full sentence generated so far, not only to the last word, and it is not straightforward to apply the scheduled sampling technique.

Sentence

A Simple and Effective Approach to Automatic Post-Editing with Transfer Learning

1 code implementation14 Jun 2019 Gonçalo M. Correia, André F. T. Martins

Automatic post-editing (APE) seeks to automatically refine the output of a black-box machine translation (MT) system through human post-edits.

Automatic Post-Editing Transfer Learning +1

Unbabel's Submission to the WMT2019 APE Shared Task: BERT-based Encoder-Decoder for Automatic Post-Editing

no code implementations WS 2019 António V. Lopes, M. Amin Farajian, Gonçalo M. Correia, Jonay Trenous, André F. T. Martins

Analogously to dual-encoder architectures we develop a BERT-based encoder-decoder (BED) model in which a single pretrained BERT encoder receives both the source src and machine translation tgt strings.

Automatic Post-Editing NMT +1

Selective Attention for Context-aware Neural Machine Translation

1 code implementation NAACL 2019 Sameen Maruf, André F. T. Martins, Gholamreza Haffari

Despite the progress made in sentence-level NMT, current systems still fall short at achieving fluent, good quality translation for a full document.

Machine Translation NMT +2

Learning with Fenchel-Young Losses

3 code implementations8 Jan 2019 Mathieu Blondel, André F. T. Martins, Vlad Niculae

Over the past decades, numerous loss functions have been been proposed for a variety of supervised learning tasks, including regression, classification, ranking, and more generally structured prediction.

Structured Prediction

Towards Dynamic Computation Graphs via Sparse Latent Structure

1 code implementation EMNLP 2018 Vlad Niculae, André F. T. Martins, Claire Cardie

Deep NLP models benefit from underlying structures in the data---e. g., parse trees---typically extracted using off-the-shelf parsers.

graph construction

Contextual Neural Model for Translating Bilingual Multi-Speaker Conversations

1 code implementation WS 2018 Sameen Maruf, André F. T. Martins, Gholamreza Haffari

In this work, we propose the task of translating Bilingual Multi-Speaker Conversations, and explore neural architectures which exploit both source and target-side conversation histories for this task.

Document Translation Machine Translation +1

Learning Classifiers with Fenchel-Young Losses: Generalized Entropies, Margins, and Algorithms

2 code implementations24 May 2018 Mathieu Blondel, André F. T. Martins, Vlad Niculae

This paper studies Fenchel-Young losses, a generic way to construct convex loss functions from a regularization function.

Marian: Fast Neural Machine Translation in C++

2 code implementations ACL 2018 Marcin Junczys-Dowmunt, Roman Grundkiewicz, Tomasz Dwojak, Hieu Hoang, Kenneth Heafield, Tom Neckermann, Frank Seide, Ulrich Germann, Alham Fikri Aji, Nikolay Bogoychev, André F. T. Martins, Alexandra Birch

We present Marian, an efficient and self-contained Neural Machine Translation framework with an integrated automatic differentiation engine based on dynamic computation graphs.

Machine Translation Translation

Cannot find the paper you are looking for? You can Submit a new open access paper.