no code implementations • CCL 2020 • Jiu Sha, Luqin Zhou, Chong Feng, Hongzheng Li, Tianfu Zhang, Hui Hui
面向司法领域的藏汉机器翻译面临严重的数据稀疏问题。本文将从两个方面展录研究:第一, 相比于通用领域, 司法领域的藏语要有更严谨的逻辑表达和更多的专业术语。然而, 目前藏语资源在司法领域内缺乏对应的语料, 稀缺专业术语词以及句法结构。第二, 藏语的特殊词汇表达方式和特定句法结构使得通用语料构建方法难以构建藏汉平行语料库。为此, 本文提出仺种针对司法领域藏汉平行语料的轻量级构建方法。首先, 我们采取人工标注获取一个中等规模的司法领域藏汉专业术语表作为先验知识库, 以避免领域越界而产生的语料逻辑表达问题和领域术语缺失问题;其次, 我们从全国的地方法庭官网采集实例语料数据, 例如裁判文书。我们优先寻找藏文实例数据, 其次是汉语, 以避免后续构造藏语句子而丢失特殊的词汇表达和句式结构。我们基于以上原则采集藏汉语料构建高质量的藏汉平行语料库, 具体方法包括:爬虫获取语料, 规则断章对齐检测, 语句边界识别, 语料库自动清洗。朂终, 我们构建了16万级规模的藏汉司法领域语料库, 并通过多种翻译模型和交叉实验验证了构建的语料库的高质量特点和鲁棒性。另外, 此语料库会弚源以便于相关研究人员用于科研工作。
no code implementations • 20 Oct 2024 • Zhenyu Lin, Hongzheng Li, Yingxia Shao, Guanhua Ye, Yawen Li, Quanqing Xu
The existing research on efficient data augmentation methods and ideal pretext tasks for graph contrastive learning remains limited, resulting in suboptimal node representation in the unsupervised setting.
1 code implementation • 7 Jun 2024 • Xizhi Gu, Hongzheng Li, Shihong Gao, Xinyan Zhang, Lei Chen, Yingxia Shao
To address this memory problem, a popular solution is mini-batch GNN training.
no code implementations • 23 Mar 2024 • Hongzheng Li, Ruojin Wang, Ge Shi, Xing Lv, Lei Lei, Chong Feng, Fang Liu, JinKun Lin, Yangguang Mei, Lingnan Xu
In this paper, we introduce RAAMove, a comprehensive multi-domain corpus dedicated to the annotation of move structures in RA abstracts.
no code implementations • 1 Nov 2022 • Yingxia Shao, Hongzheng Li, Xizhi Gu, Hongbo Yin, Yawen Li, Xupeng Miao, Wentao Zhang, Bin Cui, Lei Chen
In recent years, many efforts have been made on distributed GNN training, and an array of training algorithms and systems have been proposed.
no code implementations • 4 Mar 2020 • Hongzheng Li, He-Yan Huang
Back translation (BT) has been widely used and become one of standard techniques for data augmentation in Neural Machine Translation (NMT), BT has proven to be helpful for improving the performance of translation effectively, especially for low-resource scenarios.
no code implementations • WS 2017 • Hongzheng Li, Philippe Langlais, Yaohong Jin
Implicit discourse connectives and relations are distributed more widely in Chinese texts, when translating into English, such connectives are usually translated explicitly.