Search Results for author: Marcos Treviso

Found 9 papers, 4 papers with code

Predicting Attention Sparsity in Transformers

no code implementations24 Sep 2021 Marcos Treviso, António Góis, Patrick Fernandes, Erick Fonseca, André F. T. Martins

A bottleneck in transformer architectures is their quadratic complexity with respect to the input sequence, which has motivated a body of work on efficient sparse approximations to softmax.

Language Modelling Machine Translation +2

Sparse Continuous Distributions and Fenchel-Young Losses

1 code implementation4 Aug 2021 André F. T. Martins, Marcos Treviso, António Farinhas, Pedro M. Q. Aguiar, Mário A. T. Figueiredo, Mathieu Blondel, Vlad Niculae

When $\Omega$ is a Tsallis negentropy with parameter $\alpha$, we obtain "deformed exponential families," which include $\alpha$-entmax and sparsemax ($\alpha$ = 2) as particular cases.

Audio Classification Question Answering +1

Sparse and Continuous Attention Mechanisms

2 code implementations NeurIPS 2020 André F. T. Martins, António Farinhas, Marcos Treviso, Vlad Niculae, Pedro M. Q. Aguiar, Mário A. T. Figueiredo

Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e. g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation).

Machine Translation Question Answering +3

Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks

3 code implementations WS 2017 Nathan Hartmann, Erick Fonseca, Christopher Shulby, Marcos Treviso, Jessica Rodrigues, Sandra Aluisio

Word embeddings have been found to provide meaningful representations for words in an efficient way; therefore, they have become common in Natural Language Processing sys- tems.

POS Semantic Similarity +2

Cannot find the paper you are looking for? You can Submit a new open access paper.