Chinese Word Segmentation

39 papers with code • 5 benchmarks • 2 datasets

Chinese word segmentation is the task of splitting Chinese text (i.e. a sequence of Chinese characters) into words (Source: www.nlpprogress.com).

Most implemented papers

ZEN: Pre-training Chinese Text Encoder Enhanced by N-gram Representations

sinovation/ZEN Findings of the Association for Computational Linguistics 2020

Moreover, it is shown that reasonable performance can be obtained when ZEN is trained on a small corpus, which is important for applying pre-training techniques to scenarios with limited data.

Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks

bzhangGo/zero EMNLP 2018

Experiments on WMT14 translation tasks demonstrate that ATR-based neural machine translation can yield competitive performance on English- German and English-French language pairs in terms of both translation quality and speed.

PKUSEG: A Toolkit for Multi-Domain Chinese Word Segmentation

lancopku/pkuseg-python 27 Jun 2019

Chinese word segmentation (CWS) is a fundamental step of Chinese natural language processing.

Segmental Recurrent Neural Networks

ykrmm/TREMBA 18 Nov 2015

Representations of the input segments (i. e., contiguous subsequences of the input) are computed by encoding their constituent tokens using bidirectional recurrent neural nets, and these "segment embeddings" are used to define compatibility scores with output labels.

LSICC: A Large Scale Informal Chinese Corpus

JaniceZhao/Douban-Dushu-Dataset 26 Nov 2018

Deep learning based natural language processing model is proven powerful, but need large-scale dataset.

Glyce: Glyph-vectors for Chinese Character Representations

ShannonAI/glyce NeurIPS 2019

However, due to the lack of rich pictographic evidence in glyphs and the weak generalization ability of standard computer vision models on character data, an effective way to utilize the glyph information remains to be found.

Investigating Self-Attention Network for Chinese Word Segmentation

gump88/SAN-CWS 26 Jul 2019

Neural network has become the dominant method for Chinese word segmentation.

Exploring Segment Representations for Neural Segmentation Models

ExpResults/segrep-for-nn-semicrf 19 Apr 2016

Many natural language processing (NLP) tasks can be generalized into segmentation problem.

Neural Word Segmentation Learning for Chinese

jcyk/CWS ACL 2016

Most previous approaches to Chinese word segmentation formalize this problem as a character-based sequence labeling task where only contextual information within fixed sized local windows and simple interactions between adjacent tags can be captured.