Trankit: A Light-Weight Transformer-based Toolkit for Multilingual Natural Language Processing

We introduce Trankit, a light-weight Transformer-based Toolkit for multilingual Natural Language Processing (NLP). It provides a trainable pipeline for fundamental NLP tasks over 100 languages, and 90 pretrained pipelines for 56 languages... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Tokenization UD2.5 test Trankit Macro-averaged F1 99.23 # 2
Tokenization UD2.5 test Stanza Macro-averaged F1 99.26 # 1
Sentence segmentation UD2.5 test Trankit Macro-averaged F1 91.82 # 1
Part-Of-Speech Tagging UD2.5 test Trankit Macro-averaged F1 95.65 # 1
Dependency Parsing UD2.5 test Stanza Macro-averaged F1 83.06 # 2
Dependency Parsing UD2.5 test Trankit Macro-averaged F1 87.06 # 1
Sentence segmentation UD2.5 test Stanza Macro-averaged F1 88.58 # 2
Part-Of-Speech Tagging UD2.5 test Stanza Macro-averaged F1 94.21 # 2

Methods used in the Paper