From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification

5 Feb 2016André F. T. MartinsRamón Fernandez Astudillo

We propose sparsemax, a new activation function similar to the traditional softmax, but able to output sparse probabilities. After deriving its properties, we show how its Jacobian can be efficiently computed, enabling its use in a network trained with backpropagation... (read more)

PDF Abstract

Evaluation results from the paper


  Submit results from this paper to get state-of-the-art GitHub badges and help community compare results to other papers.