Search Results for author: Marcos Treviso

Found 19 papers, 12 papers with code

LaTIM: Measuring Latent Token-to-Token Interactions in Mamba Models

no code implementations21 Feb 2025 Hugo Pitorro, Marcos Treviso

State space models (SSMs), such as Mamba, have emerged as an efficient alternative to transformers for long-context sequence modeling.

Machine Translation Mamba +2

AdaSplash: Adaptive Sparse Flash Attention

1 code implementation17 Feb 2025 Nuno Gonçalves, Marcos Treviso, André F. T. Martins

The computational cost of softmax-based attention in transformers limits their applicability to long-context tasks.

Language Modeling Language Modelling +2

How Effective are State Space Models for Machine Translation?

1 code implementation7 Jul 2024 Hugo Pitorro, Pavlo Vasylenko, Marcos Treviso, André F. T. Martins

Transformers are the current architecture of choice for NLP, but their attention layers do not scale well to long contexts.

Machine Translation Mamba +3

xTower: A Multilingual LLM for Explaining and Correcting Translation Errors

no code implementations27 Jun 2024 Marcos Treviso, Nuno M. Guerreiro, Sweta Agrawal, Ricardo Rei, José Pombal, Tania Vaz, Helena Wu, Beatriz Silva, Daan van Stigt, André F. T. Martins

While machine translation (MT) systems are achieving increasingly strong performance on benchmarks, they often produce translations with errors and anomalies.

Error Understanding Language Modeling +4

CREST: A Joint Framework for Rationalization and Counterfactual Text Generation

1 code implementation26 May 2023 Marcos Treviso, Alexis Ross, Nuno M. Guerreiro, André F. T. Martins

Selective rationales and counterfactual examples have emerged as two effective, complementary classes of interpretability methods for analyzing and training NLP models.

counterfactual Data Augmentation +2

The Inside Story: Towards Better Understanding of Machine Translation Neural Evaluation Metrics

1 code implementation19 May 2023 Ricardo Rei, Nuno M. Guerreiro, Marcos Treviso, Luisa Coheur, Alon Lavie, André F. T. Martins

Neural metrics for machine translation evaluation, such as COMET, exhibit significant improvements in their correlation with human judgments, as compared to traditional metrics based on lexical overlap, such as BLEU.

Decision Making Machine Translation +2

Learning to Scaffold: Optimizing Model Explanations for Teaching

1 code implementation22 Apr 2022 Patrick Fernandes, Marcos Treviso, Danish Pruthi, André F. T. Martins, Graham Neubig

In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model.

Meta-Learning model

Predicting Attention Sparsity in Transformers

no code implementations spnlp (ACL) 2022 Marcos Treviso, António Góis, Patrick Fernandes, Erick Fonseca, André F. T. Martins

Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax.

Decoder Language Modeling +5

Sparse Continuous Distributions and Fenchel-Young Losses

1 code implementation4 Aug 2021 André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.

Audio Classification Question Answering +1

Sparse and Continuous Attention Mechanisms

2 code implementations NeurIPS 2020 André F. T. Martins, António Farinhas, Marcos Treviso, Vlad Niculae, Pedro M. Q. Aguiar, Mário A. T. Figueiredo

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e. g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation).

Machine Translation Question Answering +4

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

3 code implementations WS 2017 Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos Treviso, Jessica Rodrigues, Sandra Aluisio

Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems.

POS POS Tagging +4

Cannot find the paper you are looking for? You can Submit a new open access paper.