Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com).
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing.
We introduce N-LTP, an open-source Python Chinese natural language processing toolkit supporting five basic tasks: Chinese word segmentation, part-of-speech tagging, named entity recognition, dependency parsing, and semantic dependency parsing.
Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.
Ranked #1 on Chinese Part-of-Speech Tagging on CTB5
However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.
Ranked #1 on Chinese Sentence Pair Classification on XNLI (Accuracy metric)
CHINESE DEPENDENCY PARSING CHINESE NAMED ENTITY RECOGNITION CHINESE PART-OF-SPEECH TAGGING CHINESE SEMANTIC ROLE LABELING CHINESE SENTENCE PAIR CLASSIFICATION CHINESE WORD SEGMENTATION DEPENDENCY PARSING DOCUMENT CLASSIFICATION IMAGE CLASSIFICATION LANGUAGE MODELLING MACHINE TRANSLATION MULTI-TASK LEARNING PART-OF-SPEECH TAGGING SEMANTIC ROLE LABELING SEMANTIC TEXTUAL SIMILARITY SENTENCE CLASSIFICATION SENTIMENT ANALYSIS
We present a simple yet elegant solution to train a single joint model on multi-criteria corpora for Chinese Word Segmentation (CWS).
However, existing methods for Chinese NER either do not exploit word boundary information from CWS or cannot filter the specific information of CWS.
Ranked #1 on Chinese Named Entity Recognition on SighanNER
The performance of the Chinese Word Segmentation (CWS) systems has gradually reached a plateau with the rapid development of deep neural networks, especially the successful use of large pre-trained models.