Adaptive Softmax

Introduced by Grave et al. in Efficient softmax approximation for GPUs

Adaptive Softmax is a speedup technique for the computation of probability distributions over words. The adaptive softmax is inspired by the class-based hierarchical softmax, where the word classes are built to minimize the computation time. Adaptive softmax achieves efficiency by explicitly taking into account the computation time of matrix-multiplication on parallel systems and combining it with a few important observations, namely keeping a shortlist of frequent words in the root node and reducing the capacity of rare words.

Source: Efficient softmax approximation for GPUs

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Language Modelling	40	34.48%
Machine Translation	7	6.03%
Translation	6	5.17%
Speech Recognition	5	4.31%
Sentence	4	3.45%
Paraphrase Identification	3	2.59%
Text Generation	3	2.59%
Automatic Speech Recognition (ASR)	3	2.59%
Reinforcement Learning (RL)	2	1.72%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
🤖 No Components Found	You can add them if they exist; e.g. Mask R-CNN uses RoIAlign

Categories

Add Remove

Output Functions