TSDAE

Introduced by Wang et al. in TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

TSDAE is an unsupervised sentence embedding method. During training, TSDAE encodes corrupted sentences into fixed-sized vectors and requires the decoder to reconstruct the original sentences from this sentence embedding. For good reconstruction quality, the semantics must be captured well in the sentence embedding from the encoder. Later, at inference, we only use the encoder for creating sentence embeddings.

The model architecture of TSDAE is a modified encoder-decoder Transformer where the key and value of the cross-attention are both confined to the sentence embedding only. Formally, the formulation of the modified cross-attention is:

$$ H^{(k)}=\text { Attention }\left(H^{(k-1)},\left[s^{T}\right],\left[s^{T}\right]\right) $$

$$ \operatorname{Attention}(Q, K, V)=\operatorname{softmax}\left(\frac{Q K^{T}}{\sqrt{d}}\right) V $$

where $H^{(k)} \in \mathbb{R}^{t \times d}$ is the decoder hidden states within $t$ decoding steps at the $k$-th layer, $d$ is the size of the sentence embedding, $\left[s^{T}\right] \in \mathbb{R}^{1 \times d}$ is a one-row matrix including the sentence embedding vector and $Q, K$ and $V$ are the query, key and value, respectively. By exploring different configurations on the STS benchmark dataset, the authors discover that the best combination is: (1) adopting deletion as the input noise and setting the deletion ratio to $0.6,(2)$ using the output of the [CLS] token as fixed-sized sentence representation (3) tying the encoder and decoder parameters during training.

Source: TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning

Read Paper See Code

Papers

Paper	Code	Results	Date	Stars

Tasks

Task	Papers	Share
Domain Adaptation	3	20.00%
Retrieval	2	13.33%
Unsupervised Domain Adaptation	2	13.33%
Denoising	1	6.67%
Information Retrieval	1	6.67%
Language Modelling	1	6.67%
Paraphrase Identification	1	6.67%
Semantic Textual Similarity	1	6.67%
Sentence	1	6.67%

Usage Over Time

This feature is experimental; we are continuously improving our matching algorithm.

Components

Component	Type	Add Remove
Scaled Dot-Product Attention	Attention Mechanisms

Categories

Add Remove

Sentence Embeddings