DeLiGHT is a transformer architecture that delivers parameter efficiency improvements by (1) within each Transformer block using DExTra, a deep and light-weight transformation, allowing for the use of single-headed attention and bottleneck FFN layers and (2) across blocks using block-wise scaling, that allows for shallower and narrower DeLighT blocks near the input and wider and deeper DeLighT blocks near the output.
Source: DeLighT: Deep and Light-weight TransformerPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 1 | 33.33% |
Machine Translation | 1 | 33.33% |
Translation | 1 | 33.33% |