Despite the progress in machine translation quality estimation and evaluation in the last years, decoding in neural machine translation (NMT) is mostly oblivious to this and centers around finding the most probable translation according to the model (MAP decoding), approximated with beam search.
In contrast, for finite domains, recent work on sparse alternatives to softmax (e. g., sparsemax, $\alpha$-entmax, and fusedmax), has led to distributions with varying support.
Visual attention mechanisms are a key component of neural network models for computer vision.
Exponential families are widely used in machine learning; they include many distributions in continuous and discrete domains (e. g., Gaussian, Dirichlet, Poisson, and categorical distributions via the softmax transformation).
Ranked #28 on Visual Question Answering on VQA v2 test-dev