Search Results for author: Erhong Yang

Found 24 papers, 7 papers with code

汉语学习者依存句法树库构建(Construction of a Treebank of Learner Chinese)

no code implementations CCL 2020 Jialu Shi, Xinyu Luo, Liner Yang, Dan Xiao, Zhengsheng Hu, Yijun Wang, Jiaxin Yuan, Yu Jingsi, Erhong Yang

汉语学习者依存句法树库为非母语者语料提供依存句法分析, 可以支持第二语言教学与研究, 也对面向第二语言的句法分析、语法改错等相关研究具有重要意义。然而, 现有的汉语学习者依存句法树库数量较少, 且在标注方面仍存在一些问题。为此, 本文改进依存句法标注规范, 搭建在线标注平台, 并开展汉语学习者依存句法标注。本文重点介绍了数据选取、标注流程等问题, 并对标注结果进行质量分析, 探索二语偏误对标注质量与句法分析的影响。

Overview of NLPTEA-2020 Shared Task for Chinese Grammatical Error Diagnosis

no code implementations AACL (NLP-TEA) 2020 Gaoqi Rao, Erhong Yang, Baolin Zhang

This paper presents the NLPTEA 2020 shared task for Chinese Grammatical Error Diagnosis (CGED) which seeks to identify grammatical error types, their range of occurrence and recommended corrections within sentences written by learners of Chinese as a foreign language.

Position

汉语增强依存句法自动转换研究(Transformation of Enhanced Dependencies in Chinese)

no code implementations CCL 2022 Jingsi Yu, Shi Jialu, Liner Yang, Dan Xiao, Erhong Yang

“自动句法分析是自然语言处理中的一项核心任务, 受限于依存句法中每个节点只能有一条入弧的规则, 基础依存句法中许多实词之间的关系无法用依存弧和依存标签直接标明;同时, 已有的依存句法体系中的依存关系还有进一步细化、提升的空间, 以便从中提取连贯的语义关系。面对这种情况, 本文在斯坦福基础依存句法规范的基础上, 研制了汉语增强依存句法规范, 主要贡献在于:介词和连词的增强、并列项的传播、句式转换和特殊句式的增强。此外, 本文提供了基于Python的汉语增强依存句法转换的转换器, 以及一个基于Web的演示, 该演示将句子从基础依存句法树通过本文的规范解析成依存图。最后, 本文探索了增强依存句法的实际应用, 并以搭配抽取和信息抽取为例进行相关讨论。”

面向汉语作为第二语言学习的个性化语法纠错(Personalizing Grammatical Error Correction for Chinese as a Second Language)

no code implementations CCL 2020 Shengsheng Zhang, Guina Pang, Liner Yang, Chencheng Wang, Yongping Du, Erhong Yang, Yaping Huang

语法纠错任务旨在通过自然语言处理技术自动检测并纠正文本中的语序、拼写等语法错误。当前许多针对汉语的语法纠错方法已取得较好的效果, 但往往忽略了学习者的个性化特征, 如二语等级、母语背景等。因此, 本文面向汉语作为第二语言的学习者, 提出个性化语法纠错, 对不同特征的学习者所犯的错误分别进行纠正, 并构建了不同领域汉语学习者的数据集进行实验。实验结果表明, 将语法纠错模型适应到学习者的各个领域后, 性能得到明显提升。

Grammatical Error Correction

基于BERT与柱搜索的中文释义生成(Chinese Definition Modeling Based on BERT and Beam Seach)

no code implementations CCL 2020 Qinan Fan, Cunliang Kong, Liner Yang, Erhong Yang

释义生成任务是指为一个目标词生成相应的释义。前人研究中文释义生成任务时未考虑目标词的上下文, 本文首次在中文释义生成任务中使用了目标词的上下文信息, 并提出了一个基于BERT与柱搜索的释义生成模型。本文构建了包含上下文的CWN中文数据集用于开展实验, 除了BLEU指标之外, 还使用语义相似度作为额外的自动评价指标, 实验结果显示本文模型在中文CWN数据集和英文Oxford数据集上均有显著提升, 人工评价结果也与自动评价结果一致。最后, 本文对生成实例进行了深入分析。

句式结构树库的自动构建研究(Automatic Construction of Sentence Pattern Structure Treebank)

no code implementations CCL 2022 Chenhui Xie, Zhengsheng Hu, Liner Yang, Tianxin Liao, Erhong Yang

“句式结构树库是以句本位语法为理论基础构建的句法资源, 对汉语教学以及句式结构自动句法分析等研究具有重要意义。目前已有的句式结构树库语料主要来源于教材领域, 其他领域的标注数据较为缺乏, 如何高效地扩充高质量的句法树库是值得研究的问题。人工标注句法树库费时费力, 并且树库质量也难以保证, 为此, 本文尝试通过规则的方法, 将宾州中文树库(ctb)转换为句式结构树库, 从而扩大现有句式结构树库的规模。实验结果表明, 本文提出的基于树库转换规则的方法是有效的。”

Sentence

中美学者学术英语写作中词汇难度特征比较研究——以计算语言学领域论文为例(A Comparative Study of the Features of Lexical Sophistication in Academic English Writing by Chinese and American)

no code implementations CCL 2021 Yonghui Xie, Yang Liu, Erhong Yang, Liner Yang

“学术英语写作在国际学术交流中的作用日益凸显, 然而对于英语非母语者, 学术英语写作是困难的, 为此本文对计算语言领域中美学者学术英语写作中词汇难度特征做比较研究。自构建1132篇中美论文全文语料库, 统计语料中484个词汇难度特征值。经过特征筛选与因子分析的降维处理得到表现较好的五个维度。最后计算中美学者论文的维度分从而比较差异, 发现美国学者的论文相较中国学者的论文中词汇单位更具常用性、二元词串更具稳固性、三元词串更具稳固性、虚词更具复杂性、词类更具关联性。主要原因在于统计特征值时借助的外部资源库与美国学者的论文更贴近, 且中国学者没有完全掌握该领域学术写作的习惯。因此, 中国学者可充分利用英语本族语者构建的资源库, 从而产出更为地道与流利的学术英语论文。”

Cross-domain Chinese Sentence Pattern Parsing

no code implementations26 Feb 2024 Jingsi Yu, Cunliang Kong, Liner Yang, Meishan Zhang, Lin Zhu, Yujie Wang, Haozhe Lin, Maosong Sun, Erhong Yang

Sentence Pattern Structure (SPS) parsing is a syntactic analysis method primarily employed in language teaching. Existing SPS parsers rely heavily on textbook corpora for training, lacking cross-domain capability. To overcome this constraint, this paper proposes an innovative approach leveraging large language models (LLMs) within a self-training framework.

Sentence

OMGEval: An Open Multilingual Generative Evaluation Benchmark for Large Language Models

1 code implementation21 Feb 2024 Meng Xu, Shuo Wang, Liner Yang, Haoyu Wang, Zhenghao Liu, Cunliang Kong, Yun Chen, Yang Liu, Maosong Sun, Erhong Yang

We evaluate several representative multilingual LLMs on the proposed OMGEval, which we believe will provide a valuable reference for the community to further understand and improve the multilingual capability of LLMs.

General Knowledge Logical Reasoning

MCTS: A Multi-Reference Chinese Text Simplification Dataset

1 code implementation5 Jun 2023 Ruining Chong, Luming Lu, Liner Yang, Jinran Nie, Zhenghao Liu, Shuo Wang, Shuhan Zhou, Yaoxin Li, Erhong Yang

We hope to build a basic understanding of Chinese text simplification through the foundational work and provide references for future research.

Machine Translation Text Simplification

Lexical Complexity Controlled Sentence Generation

no code implementations26 Nov 2022 Jinran Nie, Liner Yang, Yun Chen, Cunliang Kong, Junhui Zhu, Erhong Yang

Compared with potential solutions, our approach fuses the representations of the word complexity levels into the model to get better control of lexical complexity.

Sentence Text Generation

LitMind Dictionary: An Open-Source Online Dictionary

1 code implementation23 Apr 2022 Cunliang Kong, Xuezhi Fang, Liner Yang, Yun Chen, Erhong Yang

Since traditional dictionaries present word senses as discrete items in predefined inventories, they fall short of flexibility, which is required in providing specific meanings of words in particular contexts.

BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling

1 code implementation SemEval (NAACL) 2022 Cunliang Kong, Yujie Wang, Ruining Chong, Liner Yang, Hengyuan Zhang, Erhong Yang, Yaping Huang

This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack, achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and French.

Language Modelling Word Embeddings

Multitasking Framework for Unsupervised Simple Definition Generation

2 code implementations ACL 2022 Cunliang Kong, Yun Chen, Hengyuan Zhang, Liner Yang, Erhong Yang

We demonstrate that the framework can generate relevant, simple definitions for the target words through automatic and manual evaluations on English and Chinese datasets.

YACLC: A Chinese Learner Corpus with Multidimensional Annotation

1 code implementation30 Dec 2021 Yingying Wang, Cunliang Kong, Liner Yang, Yijun Wang, Xiaorong Lu, Renfen Hu, Shan He, Zhenghao Liu, Yun Chen, Erhong Yang, Maosong Sun

This resource is of great relevance for second language acquisition research, foreign-language teaching, and automatic grammatical error correction.

Grammatical Error Correction Language Acquisition +1

Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning

no code implementations29 Jan 2021 Shengsheng Zhang, Yaping Huang, Yun Chen, Liner Yang, Chencheng Wang, Erhong Yang

We exploit a set of data-rich source domains to learn the initialization of model parameters that facilitates fast adaptation on new resource-poor target domains.

Domain Adaptation Grammatical Error Correction +2

Controllable Data Synthesis Method for Grammatical Error Correction

no code implementations29 Sep 2019 Liner Yang, Chencheng Wang, Yun Chen, Yongping Du, Erhong Yang

We propose two data synthesis methods which can control the error rate and the ratio of error types on synthetic data.

Grammatical Error Correction

Incorporating Sememes into Chinese Definition Modeling

1 code implementation16 May 2019 Liner Yang, Cunliang Kong, Yun Chen, Yang Liu, Qinan Fan, Erhong Yang

To accomplish this task, we construct the Chinese Definition Modeling Corpus (CDM), which contains triples of word, sememes and the corresponding definition.

Cannot find the paper you are looking for? You can Submit a new open access paper.