About

Splitting a string into parts, i.e., tokens.

Benchmarks

TREND DATASET BEST METHOD PAPER TITLE PAPER CODE COMPARE

Datasets

Greatest papers with code

Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet

28 Jan 2021lucidrains/vit-pytorch

To overcome such limitations, we propose a new Tokens-To-Token Vision Transformer (T2T-ViT), which incorporates 1) a layer-wise Tokens-to-Token (T2T) transformation to progressively structurize the image to tokens by recursively aggregating neighboring Tokens into one Token (Tokens-to-Token), such that local structure represented by surrounding tokens can be modeled and tokens length can be reduced; 2) an efficient backbone with a deep-narrow structure for vision transformer motivated by CNN architecture design after empirical study.

IMAGE CLASSIFICATION LANGUAGE MODELLING TOKENIZATION

The RWTH Aachen University Supervised Machine Translation Systems for WMT 2018

WS 2018 awslabs/sockeye

In total we improve by 6. 8{\%} BLEU over our last year{'}s submission and by 4. 8{\%} BLEU over the winning system of the 2017 German→English task.

MACHINE TRANSLATION TOKENIZATION

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

LREC 2018 bheinzerling/bpemb

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE).

ENTITY TYPING TOKENIZATION WORD EMBEDDINGS

A Call for Clarity in Reporting BLEU Scores

WS 2018 mjpost/sacreBLEU

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric.

MACHINE TRANSLATION TOKENIZATION