Output Functions

Hierarchical Softmax

Hierarchical Softmax is a is an alternative to softmax that is faster to evaluate: it is $O\left(\log{n}\right)$ time to evaluate compared to $O\left(n\right)$ for softmax. It utilises a multi-layer binary tree, where the probability of a word is calculated through the product of probabilities on each edge on the path to that node. See the Figure to the right for an example of where the product calculation would occur for the word "I'm".

(Introduced by Morin and Bengio)

Image Credit: Steven Schmatz


Paper Code Results Date Stars


Component Type
🤖 No Components Found You can add them if they exist; e.g. Mask R-CNN uses RoIAlign