Adaptive Softmax is a speedup technique for the computation of probability distributions over words. The adaptive softmax is inspired by the class-based hierarchical softmax, where the word classes are built to minimize the computation time. Adaptive softmax achieves efficiency by explicitly taking into account the computation time of matrix-multiplication on parallel systems and combining it with a few important observations, namely keeping a shortlist of frequent words in the root node and reducing the capacity of rare words.
Source: Efficient softmax approximation for GPUsPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 40 | 34.48% |
Machine Translation | 7 | 6.03% |
Translation | 6 | 5.17% |
Speech Recognition | 5 | 4.31% |
Sentence | 4 | 3.45% |
Paraphrase Identification | 3 | 2.59% |
Text Generation | 3 | 2.59% |
Automatic Speech Recognition (ASR) | 3 | 2.59% |
Reinforcement Learning (RL) | 2 | 1.72% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |