Tokenization

26 papers with code · Natural Language Processing

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

LREC 2018 bheinzerling/bpemb

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE).

ENTITY TYPING TOKENIZATION WORD EMBEDDINGS

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

CONLL 2018 adobe/NLP-Cube

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

LEMMATIZATION TOKENIZATION

A Call for Clarity in Reporting BLEU Scores

WS 2018 mjpost/sacreBLEU

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric.

MACHINE TRANSLATION TOKENIZATION

Juman++: A Morphological Analysis Toolkit for Scriptio Continua

EMNLP 2018 ku-nlp/jumanpp

We present a three-part toolkit for developing morphological analyzers for languages without natural word boundaries.

ART ANALYSIS LANGUAGE MODELLING MORPHOLOGICAL ANALYSIS PART-OF-SPEECH TAGGING TOKENIZATION

Neural Sign Language Translation

CVPR 2018 neccam/nslt

SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language.

GESTURE RECOGNITION LANGUAGE MODELLING MACHINE TRANSLATION SIGN LANGUAGE RECOGNITION SIGN LANGUAGE TRANSLATION TOKENIZATION