Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com).
Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing.
We introduce N-LTP, an open-source Python Chinese natural language processing toolkit supporting five basic tasks: Chinese word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and semantic dependency parsing.
CHINESE WORD SEGMENTATION DEPENDENCY PARSING KNOWLEDGE DISTILLATION NAMED ENTITY RECOGNITION PART-OF-SPEECH TAGGING SEMANTIC DEPENDENCY PARSING
Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.
Ranked #1 on
Chinese Part-of-Speech Tagging
on CTB5
CHINESE NAMED ENTITY RECOGNITION CHINESE WORD SEGMENTATION DOCUMENT CLASSIFICATION NATURAL LANGUAGE INFERENCE PART-OF-SPEECH TAGGING SENTENCE PAIR MODELING SENTIMENT ANALYSIS
The kernel of fastHan is a joint many-task model based on a pruned BERT, which uses the first 8 layers in BERT.
4 CHINESE WORD SEGMENTATION DEPENDENCY PARSING NAMED ENTITY RECOGNITION PART-OF-SPEECH TAGGING
However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.
Ranked #1 on
Chinese Sentence Pair Classification
on XNLI
(Accuracy metric)
CHINESE DEPENDENCY PARSING CHINESE NAMED ENTITY RECOGNITION CHINESE PART-OF-SPEECH TAGGING CHINESE SEMANTIC ROLE LABELING CHINESE SENTENCE PAIR CLASSIFICATION CHINESE WORD SEGMENTATION DEPENDENCY PARSING DOCUMENT CLASSIFICATION IMAGE CLASSIFICATION LANGUAGE MODELLING MACHINE TRANSLATION MULTI-TASK LEARNING PART-OF-SPEECH TAGGING SEMANTIC ROLE LABELING SEMANTIC TEXTUAL SIMILARITY SENTENCE CLASSIFICATION SENTIMENT ANALYSIS
We present a simple yet elegant solution to train a single joint model on multi-criteria corpora for Chinese Word Segmentation (CWS).
However, existing methods for Chinese NER either do not exploit word boundary information from CWS or cannot filter the specific information of CWS.
Ranked #1 on
Chinese Named Entity Recognition
on SighanNER
CHINESE NAMED ENTITY RECOGNITION CHINESE WORD SEGMENTATION TRANSFER LEARNING
The performance of the Chinese Word Segmentation (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks, especially the successful use of large pre-trained models.
The first is that they heavily rely on manually designed bigram feature, i. e. they are not good at capturing n-gram features automatically.
CHINESE WORD SEGMENTATION FEATURE ENGINEERING WORD EMBEDDINGS
Contextual features always play an important role in Chinese word segmentation (CWS).