1 code implementation • 18 Apr 2024 • Dawei Zhu, Liang Wang, Nan Yang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li
This paper explores context window extension of existing embedding models, pushing the limit to 32k without requiring additional training.
no code implementations • 17 Apr 2024 • Dawei Zhu, Sony Trenous, Xiaoyu Shen, Dietrich Klakow, Bill Byrne, Eva Hasler
Recent research has shown that large language models (LLMs) can achieve remarkable translation performance through supervised fine-tuning (SFT) using only a small amount of parallel data.
1 code implementation • 4 Apr 2024 • Vagrant Gautam, Eileen Bingert, Dawei Zhu, Anne Lauscher, Dietrich Klakow
We find that while models can mostly faithfully reuse previously-specified pronouns in the presence of no distractors, they are significantly worse at processing she/her/her, singular they and neopronouns.
1 code implementation • 31 Mar 2024 • Dawei Zhu, Wenhao Wu, YiFan Song, Fangwei Zhu, Ziqiang Cao, Sujian Li
Due to the scarcity of annotated data, data augmentation is commonly used for training coherence evaluation models.
no code implementations • 28 Feb 2024 • Jiebin Zhang, Eugene J. Yu, Qinyu Chen, Chenhao Xiong, Dawei Zhu, Han Qian, Mingbo Song, Xiaoguang Li, Qun Liu, Sujian Li
In today's fast-paced world, the growing demand to quickly generate comprehensive and accurate Wikipedia documents for emerging events is both crucial and challenging.
1 code implementation • 10 Oct 2023 • YiFan Song, Peiyi Wang, Weimin Xiong, Dawei Zhu, Tianyu Liu, Zhifang Sui, Sujian Li
Continual learning (CL) aims to constantly learn new knowledge over time while avoiding catastrophic forgetting on old tasks.
1 code implementation • 28 Sep 2023 • Zhiwei Fei, Xiaoyu Shen, Dawei Zhu, Fengzhe Zhou, Zhuo Han, Songyang Zhang, Kai Chen, Zongwen Shen, Jidong Ge
We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.
1 code implementation • 19 Sep 2023 • Dawei Zhu, Nan Yang, Liang Wang, YiFan Song, Wenhao Wu, Furu Wei, Sujian Li
To decouple train length from target length for efficient context window extension, we propose Positional Skip-wisE (PoSE) training that smartly simulates long inputs using a fixed context window.
no code implementations • 11 Jun 2023 • YiFan Song, Weimin Xiong, Dawei Zhu, Wenhao Wu, Han Qian, Mingbo Song, Hailiang Huang, Cheng Li, Ke Wang, Rong Yao, Ye Tian, Sujian Li
To address the practical challenges of tackling complex instructions, we propose RestGPT, which exploits the power of LLMs and conducts a coarse-to-fine online planning mechanism to enhance the abilities of task decomposition and API selection.
1 code implementation • 29 May 2023 • Peiyi Wang, Lei LI, Liang Chen, Zefan Cai, Dawei Zhu, Binghuai Lin, Yunbo Cao, Qi Liu, Tianyu Liu, Zhifang Sui
In this paper, we uncover a systematic bias in the evaluation paradigm of adopting large language models~(LLMs), e. g., GPT-4, as a referee to score and compare the quality of responses generated by candidate models.
1 code implementation • 27 May 2023 • Dawei Zhu, Xiaoyu Shen, Marius Mosbach, Andreas Stephan, Dietrich Klakow
In this paper, we revisit the setup of these approaches and find that the benefits brought by these approaches are significantly overestimated.
no code implementations • 12 May 2023 • YiFan Song, Peiyi Wang, Dawei Zhu, Tianyu Liu, Zhifang Sui, Sujian Li
Continual learning (CL) aims to constantly learn new knowledge over time while avoiding catastrophic forgetting on old tasks.
1 code implementation • 20 Mar 2023 • Hongbo Wang, Weimin Xiong, YiFan Song, Dawei Zhu, Yu Xia, Sujian Li
Joint entity and relation extraction (JERE) is one of the most important tasks in information extraction.
1 code implementation • COLING 2022 • Dawei Zhu, Qiusi Zhan, Zhejian Zhou, YiFan Song, Jiebin Zhang, Sujian Li
Different from previous token-level or sentence-level counterparts, ConFiguRe aims at extracting a figurative unit from discourse-level context, and classifying the figurative unit into the right figure type.
no code implementations • 3 Jun 2022 • Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow
However, text classification in low-resource languages is still challenging due to the lack of annotated data.
1 code implementation • 15 May 2022 • Dawei Zhu, Xiaoyu Shen, Michael A. Hedderich, Dietrich Klakow
Training deep neural networks (DNNs) under weak supervision has attracted increasing research attention as it can significantly reduce the annotation cost.
1 code implementation • insights (ACL) 2022 • Dawei Zhu, Michael A. Hedderich, Fangzhou Zhai, David Ifeoluwa Adelani, Dietrich Klakow
Incorrect labels in training data occur when human annotators make mistakes or when the data is generated via weak or distant supervision.
1 code implementation • 13 Nov 2021 • Hanwen Xu, Jiayou Zhang, Zhirui Wang, Shizhuo Zhang, Megh Manoj Bhalerao, Yucong Liu, Dawei Zhu, Sheng Wang
In the expansion of biomedical dataset, the same category may be labeled with different terms, thus being tedious and onerous to curate these terms.
no code implementations • EACL 2021 • Ernie Chang, Xiaoyu Shen, Dawei Zhu, Vera Demberg, Hui Su
Our approach automatically augments the data available for training by (i) generating new text samples based on replacing specific values by alternative ones from the same category, (ii) generating new text samples based on GPT-2, and (iii) proposing an automatic method for pairing the new text samples with data samples.
3 code implementations • 24 Jan 2021 • Michael A. Hedderich, Dawei Zhu, Dietrich Klakow
Distant and weak supervision allow to obtain large amounts of labeled training data quickly and cheaply, but these automatic annotations tend to contain a high amount of errors.
1 code implementation • EMNLP 2020 • Michael A. Hedderich, David Adelani, Dawei Zhu, Jesujoba Alabi, Udia Markus, Dietrich Klakow
Multilingual transformer models like mBERT and XLM-RoBERTa have obtained great improvements for many NLP tasks on a variety of languages.
no code implementations • 18 Mar 2020 • David Ifeoluwa Adelani, Michael A. Hedderich, Dawei Zhu, Esther van den Berg, Dietrich Klakow
Techniques such as distant and weak supervision can be used to create labeled data in a (semi-) automatic way.
Low Resource Named Entity Recognition named-entity-recognition +3
no code implementations • 19 Dec 2019 • Yue Ma, Zengfeng Zeng, Dawei Zhu, Xuan Li, Yiying Yang, Xiaoyuan Yao, Kaijie Zhou, Jianping Shen
This paper describes our approach in DSTC 8 Track 4: Schema-Guided Dialogue State Tracking.
no code implementations • 16 Dec 2019 • Dawei Zhu, Aditya Mogadala, Dietrich Klakow
We propose the Two-sidEd Attentive conditional Generative Adversarial Network (TEA-cGAN) to generate semantically manipulated images while preserving other contents such as background intact.