no code implementations • Findings (EMNLP) 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.
no code implementations • 23 May 2022 • Weizhen Qi, Yeyun Gong, Yelong Shen, Jian Jiao, Yu Yan, Houqiang Li, Ruofei Zhang, Weizhu Chen, Nan Duan
To further illustrate the commercial value of our approach, we conduct experiments on three generation tasks in real-world advertisements applications.
1 code implementation • ICLR 2022 • Simiao Zuo, Xiaodong Liu, Jian Jiao, Young Jin Kim, Hany Hassan, Ruofei Zhang, Tuo Zhao, Jianfeng Gao
While most on-going research focuses on improving SAMs models by exploring methods of routing inputs to experts, our analysis reveals that such research might not lead to the solution we expect, i. e., the commonly-used routing methods based on gating mechanisms do not work better than randomly routing inputs to experts.
no code implementations • 14 Sep 2021 • Haonan Li, Yeyun Gong, Jian Jiao, Ruofei Zhang, Timothy Baldwin, Nan Duan
Pre-trained language models have led to substantial gains over a broad range of natural language processing (NLP) tasks, but have been shown to have limitations for natural language generation tasks with high-quality requirements on the output, such as commonsense generation and ad keyword generation.
1 code implementation • The Web Conference 2021 • Deepak Saini, Arnav Kumar Jain, Kushal Dave, Jian Jiao, Amit Singh, Ruofei Zhang and Manik Varma
An efficient end-to-end implementation of GalaXC is presented that could be trained on a dataset with 50M labels and 97M training documents in less than 100 hours on 4×V100 GPUs.
no code implementations • NAACL 2021 • Zhihao Fan, Yeyun Gong, Dayiheng Liu, Zhongyu Wei, Siyuan Wang, Jian Jiao, Nan Duan, Ruofei Zhang, Xuanjing Huang
We therefore introduce a new layer named dynamic mask attention network (DMAN) with a learnable mask matrix which is able to model localness adaptively.
Ranked #9 on
Machine Translation
on WMT2014 English-German
1 code implementation • 31 Dec 2020 • Weizhen Qi, Yeyun Gong, Jian Jiao, Yu Yan, Weizhu Chen, Dayiheng Liu, Kewen Tang, Houqiang Li, Jiusheng Chen, Ruofei Zhang, Ming Zhou, Nan Duan
In this paper, we propose BANG, a new pretraining model to Bridge the gap between Autoregressive (AR) and Non-autoregressive (NAR) Generation.
no code implementations • COLING 2020 • Zhihao Fan, Yeyun Gong, Zhongyu Wei, Siyuan Wang, Yameng Huang, Jian Jiao, Xuanjing Huang, Nan Duan, Ruofei Zhang
Commonsense generation aims at generating plausible everyday scenario description based on a set of provided concepts.
2 code implementations • Findings (ACL) 2021 • Dayiheng Liu, Yu Yan, Yeyun Gong, Weizhen Qi, Hang Zhang, Jian Jiao, Weizhu Chen, Jie Fu, Linjun Shou, Ming Gong, Pengcheng Wang, Jiusheng Chen, Daxin Jiang, Jiancheng Lv, Ruofei Zhang, Winnie Wu, Ming Zhou, Nan Duan
Multi-task benchmarks such as GLUE and SuperGLUE have driven great progress of pretraining and transfer learning in Natural Language Processing (NLP).
Natural Language Processing
Natural Language Understanding
+2
no code implementations • 21 Oct 2020 • Weizhen Qi, Yeyun Gong, Yu Yan, Jian Jiao, Bo Shao, Ruofei Zhang, Houqiang Li, Nan Duan, Ming Zhou
We build a dataset from a real-word sponsored search engine and carry out experiments to analyze different generative retrieval models.
1 code implementation • EMNLP 2021 • Sanxing Chen, Xiaodong Liu, Jianfeng Gao, Jian Jiao, Ruofei Zhang, Yangfeng Ji
Our proposed model consists of two different Transformer blocks: the bottom block extracts features of each entity-relation pair in the local neighborhood of the source entity and the top block aggregates the relational information from outputs of the bottom block.
2 code implementations • 14 Feb 2020 • Wenhao Lu, Jian Jiao, Ruofei Zhang
Experimental results showed that the inference time was significantly reduced and was firstly controlled around 20ms on CPUs while at the same time the performance gain from fine-tuned BERT-Base model was mostly retained.
no code implementations • 27 Nov 2018 • Jian Jiao, Jichao Jiao, Yaokai Mo, Weilun Liu, Zhongliang Deng
This paper proposes a new framework to solve the problem of monocular visual odometry, called MagicVO .
no code implementations • 18 Feb 2018 • Ying Shan, Jian Jiao, Jie Zhu, JC Mao
Building on top of the powerful concept of semantic learning, this paper proposes a Recurrent Binary Embedding (RBE) model that learns compact representations for real-time retrieval.
1 code implementation • KDD 2016 • Ying Shan, T. Ryan Hoens, Jian Jiao, Haijing Wang, Dong Yu, JC Mao
Manually crafted combinatorial features have been the “secret sauce” behind many successful models.