Search Results for author: Marcos Treviso

Found 12 papers, 7 papers with code

Learning to Scaffold: Optimizing Model Explanations for Teaching

1 code implementation22 Apr 2022 Patrick Fernandes, Marcos Treviso, Danish Pruthi, André F. T. Martins, Graham Neubig

In this work, leveraging meta-learning techniques, we extend this idea to improve the quality of the explanations themselves, specifically by optimizing explanations such that student models more effectively learn to simulate the original model.

Meta-Learning

Predicting Attention Sparsity in Transformers

no code implementations spnlp (ACL) 2022 Marcos Treviso, António Góis, Patrick Fernandes, Erick Fonseca, André F. T. Martins

Transformers' quadratic complexity with respect to the input sequence length has motivated a body of work on efficient sparse approximations to softmax.

Language Modelling Machine Translation +3

Sparse Continuous Distributions and Fenchel-Young Losses

1 code implementation4 Aug 2021 André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.

Audio Classification Question Answering +1

Sparse and Continuous Attention Mechanisms

2 code implementations NeurIPS 2020 André F. T. Martins, António Farinhas, Marcos Treviso, Vlad Niculae, Pedro M. Q. Aguiar, Mário A. T. Figueiredo

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e. g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation).

Machine Translation Question Answering +4

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

3 code implementations WS 2017 Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos Treviso, Jessica Rodrigues, Sandra Aluisio

Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems.

POS Semantic Similarity +2

Cannot find the paper you are looking for? You can Submit a new open access paper.