no code implementations • 8 May 2020 • Xing Wu, Yibing Liu, Xiangyang Zhou, dianhai yu
As an alternative, we propose a new method for BERT distillation, i. e., asking the teacher to generate smoothed word ids, rather than labels, for teaching the student model in knowledge distillation.
no code implementations • 10 Dec 2019 • Lu Li, Zhongheng He, Xiangyang Zhou, dianhai yu
Automatic dialogue evaluation plays a crucial role in open-domain dialogue research.
1 code implementation • WS 2019 • Hongyu Li, Xiyuan Zhang, Yibing Liu, Yiming Zhang, Quan Wang, Xiangyang Zhou, Jing Liu, Hua Wu, Haifeng Wang
In this paper, we introduce a simple system Baidu submitted for MRQA (Machine Reading for Question Answering) 2019 Shared Task that focused on generalization of machine reading comprehension (MRC) models.
no code implementations • ACL 2019 • Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang
Konv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph.
7 code implementations • 13 Jun 2019 • Wenquan Wu, Zhen Guo, Xiangyang Zhou, Hua Wu, Xiyuan Zhang, Rongzhong Lian, Haifeng Wang
DuConv enables a very challenging task as the model needs to both understand dialogue and plan over the given knowledge graph.
2 code implementations • ACL 2018 • Xiangyang Zhou, Lu Li, daxiang dong, Yi Liu, Ying Chen, Wayne Xin Zhao, dianhai yu, Hua Wu
Human generates responses relying on semantic and functional dependencies, including coreference relation, among dialogue elements and their context.
Ranked #6 on Conversational Response Selection on RRS
no code implementations • EMNLP 2016 • Xiangyang Zhou, daxiang dong, Hua Wu, Shiqi Zhao, dianhai yu, Hao Tian, Xuan Liu, Rui Yan
no code implementations • 29 Oct 2014 • Xiangyang Zhou, Jiaxin Zhang, Brian Kulis
Despite strong performance for a number of clustering tasks, spectral graph cut algorithms still suffer from several limitations: first, they require the number of clusters to be known in advance, but this information is often unknown a priori; second, they tend to produce clusters with uniform sizes.