The Monte Carlo Transformer: a stochastic self-attention model for sequence prediction

15 Jul 2020  ·  Alice Martin, Charles Ollion, Florian Strub, Sylvain Le Corff, Olivier Pietquin ·

This paper introduces the Sequential Monte Carlo Transformer, an original approach that naturally captures the observations distribution in a transformer architecture. The keys, queries, values and attention vectors of the network are considered as the unobserved stochastic states of its hidden structure. This generative model is such that at each time step the received observation is a random function of its past states in a given attention window. In this general state-space setting, we use Sequential Monte Carlo methods to approximate the posterior distributions of the states given the observations, and to estimate the gradient of the log-likelihood. We hence propose a generative model giving a predictive distribution, instead of a single-point estimate.

PDF Abstract
No code implementations yet. Submit your code now



  Add Datasets introduced or used in this paper

Results from the Paper

  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.