TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Sentence segmentation	UD2.5 test	Trankit	Macro-averaged F1	91.82	# 1
Dependency Parsing	UD2.5 test	Stanza	Macro-averaged F1	83.06	# 2
Part-Of-Speech Tagging	UD2.5 test	Stanza	Macro-averaged F1	94.21	# 2
Sentence segmentation	UD2.5 test	Stanza	Macro-averaged F1	88.58	# 2
Dependency Parsing	UD2.5 test	Trankit	Macro-averaged F1	87.06	# 1
Part-Of-Speech Tagging	UD2.5 test	Trankit	Macro-averaged F1	95.65	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trankit-a-light-weight-transformer-based/sentence-segmentation-on-ud2-5-test)](https://paperswithcode.com/sota/sentence-segmentation-on-ud2-5-test?p=trankit-a-light-weight-transformer-based)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trankit-a-light-weight-transformer-based/dependency-parsing-on-ud2-5-test)](https://paperswithcode.com/sota/dependency-parsing-on-ud2-5-test?p=trankit-a-light-weight-transformer-based)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/trankit-a-light-weight-transformer-based/part-of-speech-tagging-on-ud2-5-test)](https://paperswithcode.com/sota/part-of-speech-tagging-on-ud2-5-test?p=trankit-a-light-weight-transformer-based)`

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

EACL 2021 · Minh Van Nguyen, Viet Dac Lai, Amir Pouran Ben Veyseh, Thien Huu Nguyen ·

We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages. Built on a state-of-the-art pretrained language model, Trankit significantly outperforms prior multilingual NLP pipelines over sentence segmentation, part-of-speech tagging, morphological feature tagging, and dependency parsing while maintaining competitive performance for tokenization, multi-word token expansion, and lemmatization over 90 Universal Dependencies treebanks. Despite the use of a large pretrained transformer, our toolkit is still efficient in memory usage and speed. This is achieved by our novel plug-and-play mechanism with Adapters where a multilingual pretrained transformer is shared across pipelines for different languages. Our toolkit along with pretrained models and code are publicly available at: https://github.com/nlp-uoregon/trankit. A demo website for our toolkit is also available at: http://nlp.uoregon.edu/trankit. Finally, we create a demo video for Trankit at: https://youtu.be/q0KGP3zGjGc.

PDF Abstract EACL 2021 PDF EACL 2021 Abstract

Code

Add Remove Mark official

nlp-uoregon/trankit official

705

Tasks

Add Remove

Dependency Parsing

Language Modelling

Lemmatization

Morphological Tagging

Multilingual NLP

Named Entity Recognition (NER)

Part-Of-Speech Tagging

Sentence

Sentence segmentation

Sequential sentence segmentation

Datasets

CoNLL 2003

Results from the Paper

Edit

Ranked #1 on Sentence segmentation on UD2.5 test

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Sentence segmentation	UD2.5 test	Trankit	Macro-averaged F1	91.82	# 1	Compare
Dependency Parsing	UD2.5 test	Stanza	Macro-averaged F1	83.06	# 2	Compare
Part-Of-Speech Tagging	UD2.5 test	Stanza	Macro-averaged F1	94.21	# 2	Compare
Sentence segmentation	UD2.5 test	Stanza	Macro-averaged F1	88.58	# 2	Compare
Dependency Parsing	UD2.5 test	Trankit	Macro-averaged F1	87.06	# 1	Compare
Part-Of-Speech Tagging	UD2.5 test	Trankit	Macro-averaged F1	95.65	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • Adapter • BPE • Dense Connections • Dropout • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove