DeLighT Block Explained | Papers With Code

Method Name:*

Method Full Name:*

Description with Markdown (optional):

A **DeLighT Block** is a block used in the [DeLighT](https://paperswithcode.com/method/delight) [transformer](https://paperswithcode.com/method/transformer) architecture. It uses a [DExTra](https://paperswithcode.com/method/dextra) transformation to reduce the dimensionality of the vectors entered into the attention layer, where a [single-headed attention](https://paperswithcode.com/method/single-headed-attention) module is used.  Since the DeLighT block learns wider representations of the input across different layers using DExTra, it enables the authors to replace [multi-head attention](https://paperswithcode.com/method/multi-head-attention) with single-head attention. This is then followed by a light-weight FFN which, rather than expanding the dimension (as in normal Transformers which widen to a dimension 4x the size), imposes a bottleneck and squeezes the dimensions. Again, the reason for this is that the DExTra transformation has already incorporated wider representations so we can squeeze instead at this layer.

Code Snippet URL (optional):

Image

Currently: methods/Screen_Shot_2020-08-12_at_11.55.21_PM_npCrSKq.png Clear
Change:

Attached collections:

ATTENTION MODULES

Add:

New collection name:

Top-level area:

Parent collection (if any):

Description (optional):

Task	Papers	Share
Language Modelling	1	33.33%
Machine Translation	1	33.33%
Translation	1	33.33%

Component	Type	Add Remove
DExTra	Feedforward Networks
Feedforward Network	Feedforward Networks
Layer Normalization	Normalization
Residual Connection	Skip Connections
Scaled Dot-Product Attention	Attention Mechanisms

DeLighT Block

Papers

Tasks

Usage Over Time

Components

Categories

Add Remove