ParsBERT: Transformer-based Model for Persian Language Understanding

The surge of pre-trained language models has begun a new era in the field of Natural Language Processing (NLP) by allowing us to build powerful language models. Among these models, Transformer-based models such as BERT have become increasingly popular due to their state-of-the-art performance... (read more)

PDF Abstract

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods used in the Paper


METHOD TYPE
Weight Decay
Regularization
Softmax
Output Functions
Adam
Stochastic Optimization
Multi-Head Attention
Attention Modules
Dropout
Regularization
GELU
Activation Functions
Attention Dropout
Regularization
Linear Warmup With Linear Decay
Learning Rate Schedules
Dense Connections
Feedforward Networks
Layer Normalization
Normalization
Scaled Dot-Product Attention
Attention Mechanisms
WordPiece
Subword Segmentation
Residual Connection
Skip Connections
BERT
Language Models