Search Results for author: Hongzheng Li

Found 7 papers, 0 papers with code

面向司法领域的高质量开源藏汉平行语料库构建(A High-quality Open Source Tibetan-Chinese Parallel Corpus Construction of Judicial Domain)

no code implementations CCL 2020 Jiu Sha, Luqin Zhou, Chong Feng, Hongzheng Li, Tianfu Zhang, Hui Hui

面向司法领域的藏汉机器翻译面临严重的数据稀疏问题。本文将从两个方面展录研究:第一, 相比于通用领域, 司法领域的藏语要有更严谨的逻辑表达和更多的专业术语。然而, 目前藏语资源在司法领域内缺乏对应的语料, 稀缺专业术语词以及句法结构。第二, 藏语的特殊词汇表达方式和特定句法结构使得通用语料构建方法难以构建藏汉平行语料库。为此, 本文提出仺种针对司法领域藏汉平行语料的轻量级构建方法。首先, 我们采取人工标注获取一个中等规模的司法领域藏汉专业术语表作为先验知识库, 以避免领域越界而产生的语料逻辑表达问题和领域术语缺失问题;其次, 我们从全国的地方法庭官网采集实例语料数据, 例如裁判文书。我们优先寻找藏文实例数据, 其次是汉语, 以避免后续构造藏语句子而丢失特殊的词汇表达和句式结构。我们基于以上原则采集藏汉语料构建高质量的藏汉平行语料库, 具体方法包括:爬虫获取语料, 规则断章对齐检测, 语句边界识别, 语料库自动清洗。朂终, 我们构建了16万级规模的藏汉司法领域语料库, 并通过多种翻译模型和交叉实验验证了构建的语料库的高质量特点和鲁棒性。另外, 此语料库会弚源以便于相关研究人员用于科研工作。

RAAMove: A Corpus for Analyzing Moves in Research Article Abstracts

no code implementations23 Mar 2024 Hongzheng Li, Ruojin Wang, Ge Shi, Xing Lv, Lei Lei, Chong Feng, Fang Liu, JinKun Lin, Yangguang Mei, Lingnan Xu

In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts.

Distributed Graph Neural Network Training: A Survey

no code implementations1 Nov 2022 Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, Lei Chen

In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed.

Distributed Computing

Evaluating Low-Resource Machine Translation between Chinese and Vietnamese with Back-Translation

no code implementations4 Mar 2020 Hongzheng Li, He-Yan Huang

Back translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be helpful for improving the performance of translation effectively, especially for low-resource scenarios.

Data Augmentation Machine Translation +2

Translating Implicit Discourse Connectives Based on Cross-lingual Annotation and Alignment

no code implementations WS 2017 Hongzheng Li, Philippe Langlais, Yaohong Jin

Implicit discourse connectives and relations are distributed more widely in Chinese texts, when translating into English, such connectives are usually translated explicitly.

Implicit Relations Machine Translation +1

Cannot find the paper you are looking for? You can Submit a new open access paper.