CERT: Contrastive Self-supervised Learning for Language Understanding

16 May 2020 Hongchao Fang Sicheng Wang Meng Zhou Jiayuan Ding Pengtao Xie

Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture sentence-level semantics very well... (read more)

PDF Abstract
No code implementations yet. Submit your code now

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Softmax
Output Functions
Linear Warmup With Linear Decay
Learning Rate Schedules
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
WordPiece
Subword Segmentation
Residual Connection
Skip Connections
BERT
Language Models
Multi-Head Attention
Attention Modules
Weight Decay
Regularization
Adam
Stochastic Optimization
Dropout
Regularization
GELU
Activation Functions
Attention Dropout
Regularization
Dense Connections
Feedforward Networks