no code implementations • 6 Feb 2025 • Zony Yu, Yuqiao Wen, Lili Mou
Knowledge distillation (KD) is a popular method of transferring knowledge from a large "teacher" model to a small "student" model.
no code implementations • 6 Feb 2025 • Yuqiao Wen, Yanshuai Cao, Lili Mou
Large language models have been increasing in size due to their success in a wide range of applications.
1 code implementation • 29 Feb 2024 • Behzad Shayegh, Yuqiao Wen, Lili Mou
We address unsupervised discontinuous constituency parsing, where we observe a high variance in the performance of the only previous model in the literature.
no code implementations • 29 Feb 2024 • Yuqiao Wen, Behzad Shayegh, Chenyang Huang, Yanshuai Cao, Lili Mou
The ability of zero-shot translation emerges when we train a multilingual model with certain translation directions; the model can then directly translate in unseen directions.
1 code implementation • 27 Jul 2023 • Yuqiao Wen, Zichao Li, Wenyu Du, Lili Mou
Experiments across four datasets show that our methods outperform existing KD approaches, and that our symmetric distilling losses can better force the student to learn from the teacher distribution.
2 code implementations • 29 Sep 2022 • Yuqiao Wen, Yongchang Hao, Yanshuai Cao, Lili Mou
Open-domain dialogue systems aim to interact with humans through natural language texts in an open-ended fashion.
1 code implementation • LREC 2022 • Yuqiao Wen, Guoqing Luo, Lili Mou
Open-domain dialogue systems aim to converse with humans through text, and dialogue research has heavily relied on benchmark datasets.