Be Careful What You Backpropagate: A Case For Linear Output Activations & Gradient Boosting

13 Jul 2017Anders OlandAayush BansalRoger B. DannenbergBhiksha Raj

In this work, we show that saturating output activation functions, such as the softmax, impede learning on a number of standard classification tasks. Moreover, we present results showing that the utility of softmax does not stem from the normalization, as some have speculated... (read more)

PDF Abstract

Code


No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper