WordPiece is a subword segmentation algorithm used in natural language processing. The vocabulary is initialized with individual characters in the language, then the most frequent combinations of symbols in the vocabulary are iteratively added to the vocabulary. The process is:
Text: Source
Image: WordPiece as used in BERT
Source: Google's Neural Machine Translation System: Bridging the Gap between Human and Machine TranslationPaper | Code | Results | Date | Stars |
---|
Task | Papers | Share |
---|---|---|
Language Modelling | 94 | 11.84% |
Text Classification | 42 | 5.29% |
Sentiment Analysis | 41 | 5.16% |
Retrieval | 34 | 4.28% |
Question Answering | 28 | 3.53% |
Classification | 25 | 3.15% |
NER | 21 | 2.64% |
Large Language Model | 18 | 2.27% |
Named Entity Recognition (NER) | 15 | 1.89% |
Component | Type |
|
---|---|---|
🤖 No Components Found | You can add them if they exist; e.g. Mask R-CNN uses RoIAlign |