基于大规模语料库的《古籍汉字分级字表》研究(The Formulation of The graded Chinese character list of ancient books Based on Large-scale Corpus)

CCL 2021 Changwei Xu, Minxuan Feng, Bin Li, Yiguo Yuan

"《古籍汉字分级字表》是基于大规模古籍文本语料库、为辅助学习者古籍文献阅读而研制的分级字表。该字表填补了古籍字表研究成果的空缺, 依据各汉字学习优先级别的不同, 实现了古籍汉字的等级划分, 目前收录一级字105个, 二级字340个, 三级字555个。本文介绍了该字表研制的主要依据和基本步骤, 并将其与传统识字教材“三百千”及《现代汉语常用字表》进行比较, 验证了其收字的合理性。该字表有助于学习者优先掌握古籍文本常用字, 提升古籍阅读能力, 从而促进中华优秀传统文化的继承与发展。”

Understanding Attention for Vision-and-Language Tasks

17 Aug 2022 Feiqi Cao, Soyeon Caren Han, Siqu Long, Changwei Xu, Josiah Poon

Attention mechanism has been used as an important component across Vision-and-Language(VL) tasks in order to bridge the semantic gap between visual and textual features.

Image Retrieval Question Answering +3

Shuffle Augmentation of Features from Unlabeled Data for Unsupervised Domain Adaptation

28 Jan 2022 Changwei Xu, Jianfei Yang, Haoran Tang, Han Zou, Cheng Lu, Tianshuo Zhang

Unsupervised Domain Adaptation (UDA), a branch of transfer learning where labels for target samples are unavailable, has been widely researched and developed in recent years with the help of adversarially trained models.

Unsupervised Domain Adaptation

Integration of Automatic Sentence Segmentation and Lexical Analysis of Ancient Chinese based on BiLSTM-CRF Model

LREC 2020 Ning Cheng, Bin Li, Liming Xiao, Changwei Xu, Sijia Ge, Xingyue Hao, Minxuan Feng

The basic tasks of ancient Chinese information processing include automatic sentence segmentation, word segmentation, part-of-speech tagging and named entity recognition.

Lexical Analysis named-entity-recognition +3

