1 code implementation • 20 Feb 2023 • Xiang Wei, Xingyu Cui, Ning Cheng, Xiaobin Wang, Xin Zhang, Shen Huang, Pengjun Xie, Jinan Xu, Yufeng Chen, Meishan Zhang, Yong Jiang, Wenjuan Han
Zero-shot information extraction (IE) aims to build IE systems from the unannotated text.
1 code implementation • 27 Sep 2022 • Shen Huang, Yuchen Zhai, Xinwei Long, Yong Jiang, Xiaobin Wang, Yin Zhang, Pengjun Xie
Speech Entity Linking aims to recognize and disambiguate named entities in spoken languages.
1 code implementation • 5 May 2023 • Zeqi Tan, Shen Huang, Zixia Jia, Jiong Cai, Yinghui Li, Weiming Lu, Yueting Zhuang, Kewei Tu, Pengjun Xie, Fei Huang, Yong Jiang
Also, we discover that the limited context length causes the retrieval knowledge to be invisible to the model.
Multilingual Named Entity Recognition named-entity-recognition +4
1 code implementation • 21 Aug 2023 • Tianyu Yu, Chengyue Jiang, Chao Lou, Shen Huang, Xiaobin Wang, Wei Liu, Jiong Cai, Yangning Li, Yinghui Li, Kewei Tu, Hai-Tao Zheng, Ningyu Zhang, Pengjun Xie, Fei Huang, Yong Jiang
However, LLMs are sometimes too footloose for natural language understanding (NLU) tasks which always have restricted output and input format.
1 code implementation • 14 Aug 2023 • Yangning Li, Shirong Ma, Xiaobin Wang, Shen Huang, Chengyue Jiang, Hai-Tao Zheng, Pengjun Xie, Fei Huang, Yong Jiang
EcomInstruct scales up the data size and task diversity by constructing atomic tasks with E-commerce basic data types, such as product information, user reviews.
1 code implementation • 1 Jul 2021 • Haibin Wu, Po-chun Hsu, Ji Gao, Shanshan Zhang, Shen Huang, Jian Kang, Zhiyong Wu, Helen Meng, Hung-Yi Lee
We also show that the neural vocoder adopted in the detection framework is dataset-independent.
2 code implementations • 17 Aug 2023 • Jiahao Zhang, Haiyang Zhang, Dongmei Zhang, Yong liu, Shen Huang
This approach maintains multiple partial hypotheses of relevant passages at each step, expanding the search space and reducing the risk of missing relevant passages.
Ranked #1 on Question Answering on HotpotQA
no code implementations • WS 2018 • Bojie Hu, Ambyer Han, Shen Huang
Our systems are neural machine translation systems trained with our original system TenTrans.
no code implementations • WS 2016 • Shen Huang, Houfeng Wang
Grammatical Error Diagnosis for Chinese has always been a challenge for both foreign learners and NLP researchers, for the variousity of grammar and the flexibility of expression.
no code implementations • IJCNLP 2017 • Shen Huang, Xu sun, Houfeng Wang
Boundary features are widely used in traditional Chinese Word Segmentation (CWS) methods as they can utilize unlabeled data to help improve the Out-of-Vocabulary (OOV) word recognition performance.
no code implementations • 20 Feb 2019 • Weicheng Cai, Danwei Cai, Shen Huang, Ming Li
In this paper, we present an end-to-end language identification framework, the attention-based Convolutional Neural Network-Bidirectional Long-short Term Memory (CNN-BLSTM).
no code implementations • 3 Jan 2020 • Pei Xu, Shan Huang, Hongzhen Wang, Hao Song, Shen Huang, Qi Ju
Chinese keyword spotting is a challenging task as there is no visual blank for Chinese words.
no code implementations • 13 Aug 2020 • Yiru Wang, Shen Huang, Gongfu Li, Qiang Deng, Dongliang Liao, Pengda Si, Yujiu Yang, Jin Xu
The automatic quality assessment of self-media online articles is an urgent and new issue, which is of great value to the online recommendation and search.
no code implementations • 17 Sep 2020 • Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju
Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language.
no code implementations • EMNLP 2020 • Zhen Yang, Bojie Hu, Ambyera Han, Shen Huang, Qi Ju
Unlike traditional pre-training method which randomly masks some fragments of the input sentence, the proposed CSP randomly replaces some words in the source sentence with their translation words in the target language.
no code implementations • COLING 2020 • Chen Xu, Bojie Hu, Yufan Jiang, Kai Feng, Zeyang Wang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu
This eases training by highlighting easy samples that the current model has enough competence to learn.
no code implementations • ACL 2021 • Chen Xu, Bojie Hu, Yanyang Li, Yuhao Zhang, Shen Huang, Qi Ju, Tong Xiao, Jingbo Zhu
To our knowledge, we are the first to develop an end-to-end ST system that achieves comparable or even better BLEU performance than the cascaded ST counterpart when large-scale ASR and MT data is available.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 13 Dec 2021 • Guodong Ma, Pengfei Hu, Nurmemet Yolwas, Shen Huang, Hao Huang
To boost the performance of PMT, we propose multi-modeling unit training (MMUT) architecture fusion with PMT (PM-MMUT).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Dec 2023 • Shirong Ma, Shen Huang, Shulin Huang, Xiaobin Wang, Yangning Li, Hai-Tao Zheng, Pengjun Xie, Fei Huang, Yong Jiang
Experimental results demonstrate the effectiveness of continual pre-training of E-commerce LLMs and the efficacy of our devised data mixing strategy.