Tokenization

23 papers with code · Natural Language Processing

State-of-the-art leaderboards

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

BPEmb: Tokenization-free Pre-trained Subword Embeddings in 275 Languages

LREC 2018 bheinzerling/bpemb

We present BPEmb, a collection of pre-trained subword unit embeddings in 275 languages, based on Byte-Pair Encoding (BPE).

ENTITY TYPING TOKENIZATION WORD EMBEDDINGS

Juman++: A Morphological Analysis Toolkit for Scriptio Continua

EMNLP 2018 ku-nlp/jumanpp

We present a three-part toolkit for developing morphological analyzers for languages without natural word boundaries.

ART ANALYSIS LANGUAGE MODELLING MORPHOLOGICAL ANALYSIS PART-OF-SPEECH TAGGING TOKENIZATION

A Call for Clarity in Reporting BLEU Scores

WS 2018 mjpost/sacreBLEU

The field of machine translation faces an under-recognized problem because of inconsistency in the reporting of scores from its dominant metric.

MACHINE TRANSLATION TOKENIZATION

Neural Sign Language Translation

CVPR 2018 neccam/nslt

SLR seeks to recognize a sequence of continuous signs but neglects the underlying rich grammatical and linguistic structures of sign language that differ from spoken language.

GESTURE RECOGNITION LANGUAGE MODELLING MACHINE TRANSLATION SIGN LANGUAGE RECOGNITION SIGN LANGUAGE TRANSLATION TOKENIZATION

Constructing Financial Sentimental Factors in Chinese Market Using Natural Language Processing

22 Sep 2018Coldog2333/Financial-NLP

Especially during the Chinese market crash in 2015, the Pearson correlation coefficient of adjusted sentimental factor with SSE is 0. 5844, which suggests that our model can provide a solid guidance, especially in the special period when the market is influenced greatly by public sentiment.

TOKENIZATION