Charformer is a type of Transformer model that learns a subword tokenization end-to-end as part of the model. Specifically it uses GBST that automatically learns latent subword representations from characters in a data-driven fashion. Following GBST, the soft subword sequence is passed through Transformer layers.
Source: Charformer: Fast Character Transformers via Gradient-based Subword TokenizationPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Decoder | 2 | 15.38% |
NMT | 2 | 15.38% |
Denoising | 1 | 7.69% |
Image Denoising | 1 | 7.69% |
Translation | 1 | 7.69% |
Toxic Comment Classification | 1 | 7.69% |
Linguistic Acceptability | 1 | 7.69% |
Natural Language Inference | 1 | 7.69% |
Paraphrase Identification | 1 | 7.69% |
Component | Type |
|
---|---|---|
Gradient-Based Subword Tokenization
|
Subword Segmentation | |
Transformer
|
Transformers |