no code implementations • ACL 2022 • Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Tam, Zhengxiao Du, Zhilin Yang, Jie Tang
Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training.
1 code implementation • 28 Dec 2022 • Zhihao Wang, Zongyu Lin, Peiqi Liu, Guidong Zheng, Junjie Wen, Xianxin Chen, Yujun Chen, Zhilin Yang
Label noise is ubiquitous in various machine learning scenarios such as self-labeling with model predictions and erroneous data annotation.
no code implementations • 15 Nov 2022 • Haike Xu, Zongyu Lin, Jing Zhou, Yanan Zheng, Zhilin Yang
In the finetuning setting, our approach also achieves new state-of-the-art results on a wide range of NLP tasks, with only 1/4 parameters of previous methods.
no code implementations • 9 Nov 2022 • Chonghua Liao, Yanan Zheng, Zhilin Yang
Natural language prompts have been shown to facilitate cross-task generalization for large language models.
1 code implementation • 8 Nov 2022 • Yanru Chen, Yanan Zheng, Zhilin Yang
Few-shot named entity recognition (NER) targets generalizing to unseen labels and/or domains with few labeled examples.
1 code implementation • 31 Oct 2022 • Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang
Prompt-based techniques have demostrated great potential for improving the few-shot generalization of pretrained language models.
no code implementations • 18 Jan 2022 • Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Yanggang Wang, Haiyu Li, Zhilin Yang
We propose a multitask pretraining approach ZeroPrompt for zero-shot generalization, focusing on task scaling and zero-shot prompting.
1 code implementation • 7 Nov 2021 • Xingcheng Yao, Yanan Zheng, Xiaocong Yang, Zhilin Yang
Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train.
2 code implementations • 14 Oct 2021 • Xiao Liu, Kaixuan Ji, Yicheng Fu, Weng Lam Tam, Zhengxiao Du, Zhilin Yang, Jie Tang
Prompt tuning, which only tunes continuous prompts with a frozen language model, substantially reduces per-task storage and memory usage at training.
1 code implementation • ACL 2022 • Yanan Zheng, Jing Zhou, Yujie Qian, Ming Ding, Chonghua Liao, Jian Li, Ruslan Salakhutdinov, Jie Tang, Sebastian Ruder, Zhilin Yang
The few-shot natural language understanding (NLU) task has attracted much recent attention.
1 code implementation • ACL 2022 • Jing Zhou, Yanan Zheng, Jie Tang, Jian Li, Zhilin Yang
Most previous methods for text data augmentation are limited to simple tasks and weak baselines.
1 code implementation • 1 Jun 2021 • Yongfeng Huang, Yujun Chen, Yulun Du, Zhilin Yang
The task of rationalization aims to extract pieces of input text as rationales to justify neural network predictions on text classification tasks.
no code implementations • 27 May 2021 • Xu Cao, Zijie Chen, Bolin Lai, Yuxuan Wang, Yu Chen, Zhengqing Cao, Zhilin Yang, Nanyang Ye, Junbo Zhao, Xiao-Yun Zhou, Peng Qi
For the automation, we focus on the positioning part and propose a Dual-In-Dual-Out network based on two-step learning and two-task learning, which can achieve fully automatic regression of the suitable puncture area and angle from near-infrared(NIR) images.
3 code implementations • 24 Mar 2021 • Jiaao He, Jiezhong Qiu, Aohan Zeng, Zhilin Yang, Jidong Zhai, Jie Tang
However, training trillion-scale MoE requires algorithm and system co-design for a well-tuned high performance distributed training system.
1 code implementation • 19 Mar 2021 • Xu Zou, Da Yin, Qingyang Zhong, Ming Ding, Hongxia Yang, Zhilin Yang, Jie Tang
To tackle this challenge, we propose an innovative method, inverse prompting, to better control text generation.
2 code implementations • ACL 2022 • Zhengxiao Du, Yujie Qian, Xiao Liu, Ming Ding, Jiezhong Qiu, Zhilin Yang, Jie Tang
On a wide range of tasks across NLU, conditional and unconditional generation, GLM outperforms BERT, T5, and GPT given the same model sizes and data, and achieves the best performance from a single pretrained model with 1. 25x parameters of BERT Large , demonstrating its generalizability to different downstream tasks.
Ranked #2 on
Document Summarization
on CNN / Daily Mail
5 code implementations • 18 Mar 2021 • Xiao Liu, Yanan Zheng, Zhengxiao Du, Ming Ding, Yujie Qian, Zhilin Yang, Jie Tang
On the SuperGlue benchmark, GPTs achieve comparable and sometimes better performance to similar-sized BERTs in supervised learning.
no code implementations • NeurIPS 2019 • Zhilin Yang, Thang Luong, Russ R. Salakhutdinov, Quoc V. Le
The softmax bottleneck has been shown to limit the expressiveness of neural language models.
24 code implementations • NeurIPS 2019 • Zhilin Yang, Zihang Dai, Yiming Yang, Jaime Carbonell, Ruslan Salakhutdinov, Quoc V. Le
With the capability of modeling bidirectional contexts, denoising autoencoding based pretraining like BERT achieves better performance than pretraining approaches based on autoregressive language modeling.
33 code implementations • ACL 2019 • Zihang Dai, Zhilin Yang, Yiming Yang, Jaime Carbonell, Quoc V. Le, Ruslan Salakhutdinov
Transformers have a potential of learning longer-term dependency, but are limited by a fixed-length context in the setting of language modeling.
Ranked #3 on
Language Modelling
on One Billion Word
no code implementations • NeurIPS 2018 • Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan R. Salakhutdinov, Yann Lecun
We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden units), or embedding-free units such as image pixels.
3 code implementations • EMNLP 2018 • Zhilin Yang, Peng Qi, Saizheng Zhang, Yoshua Bengio, William W. Cohen, Ruslan Salakhutdinov, Christopher D. Manning
Existing question answering (QA) datasets fail to train QA systems to perform complex reasoning and provide explanations for answers.
Ranked #36 on
Question Answering
on HotpotQA
1 code implementation • EMNLP 2018 • Jiateng Xie, Zhilin Yang, Graham Neubig, Noah A. Smith, Jaime Carbonell
To improve robustness to word order differences, we propose to use self-attention, which allows for a degree of flexibility with respect to word order.
1 code implementation • 14 Jun 2018 • Zhilin Yang, Jake Zhao, Bhuwan Dhingra, Kaiming He, William W. Cohen, Ruslan Salakhutdinov, Yann Lecun
We also show that the learned graphs are generic enough to be transferred to different embeddings on which the graphs have not been trained (including GloVe embeddings, ELMo embeddings, and task-specific RNN hidden unit), or embedding-free units such as image pixels.
no code implementations • NAACL 2018 • Bhuwan Dhingra, Qiao Jin, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
Many problems in NLP require aggregating information from multiple mentions of the same entity which may be far apart in the text.
Ranked #6 on
Question Answering
on WikiHop
no code implementations • ICLR 2018 • Zhilin Yang, Saizheng Zhang, Jack Urbanek, Will Feng, Alexander H. Miller, Arthur Szlam, Douwe Kiela, Jason Weston
Contrary to most natural language processing research, which makes use of static datasets, humans learn language interactively, grounded in an environment.
9 code implementations • ICLR 2018 • Zhilin Yang, Zihang Dai, Ruslan Salakhutdinov, William W. Cohen
We formulate language modeling as a matrix factorization problem, and show that the expressiveness of Softmax-based models (including the majority of neural language models) is limited by a Softmax bottleneck.
Ranked #10 on
Language Modelling
on Penn Treebank (Word Level)
1 code implementation • NeurIPS 2017 • Zihang Dai, Zhilin Yang, Fan Yang, William W. Cohen, Ruslan Salakhutdinov
Semi-supervised learning methods based on generative adversarial networks (GANs) obtained strong empirical results, but it is not clear 1) how the discriminator benefits from joint training with a generator, and 2) why good semi-supervised classification performance and a good generator cannot be obtained at the same time.
4 code implementations • 18 Mar 2017 • Zhilin Yang, Ruslan Salakhutdinov, William W. Cohen
Recent papers have shown that neural networks obtain state-of-the-art performance on several different sequence tagging tasks.
Ranked #10 on
Part-Of-Speech Tagging
on Penn Treebank
no code implementations • 7 Mar 2017 • Bhuwan Dhingra, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
We introduce a model that encodes such graphs as explicit memory in recurrent neural networks, and use it to model coreference relations in text.
Ranked #1 on
Question Answering
on CNN / Daily Mail
1 code implementation • NeurIPS 2017 • Fan Yang, Zhilin Yang, William W. Cohen
We propose a framework, Neural Logic Programming, that combines the parameter and structure learning of first-order logical rules in an end-to-end differentiable model.
no code implementations • 23 Feb 2017 • Yujie Qian, Jie Tang, Zhilin Yang, Binxuan Huang, Wei Wei, Kathleen M. Carley
In this paper, we formalize the problem of inferring location from social media into a semi-supervised factor graph model (SSFGM).
no code implementations • ACL 2017 • Zhilin Yang, Junjie Hu, Ruslan Salakhutdinov, William W. Cohen
In this framework, we train a generative model to generate questions based on the unlabeled text, and combine model-generated questions with human-generated questions for training question answering models.
1 code implementation • 6 Nov 2016 • Zhilin Yang, Bhuwan Dhingra, Ye Yuan, Junjie Hu, William W. Cohen, Ruslan Salakhutdinov
Previous work combines word-level and character-level representations using concatenation or scalar weighting, which is suboptimal for high-level tasks like reading comprehension.
Ranked #50 on
Question Answering
on SQuAD1.1 dev
4 code implementations • ACL 2017 • Bhuwan Dhingra, Hanxiao Liu, Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
In this paper we study the problem of answering cloze-style questions over documents.
Ranked #1 on
Question Answering
on Children's Book Test
no code implementations • NeurIPS 2016 • Zhilin Yang, Ye Yuan, Yuexin Wu, Ruslan Salakhutdinov, William W. Cohen
We propose a novel extension of the encoder-decoder framework, called a review network.
21 code implementations • 29 Mar 2016 • Zhilin Yang, William W. Cohen, Ruslan Salakhutdinov
We present a semi-supervised learning framework based on graph embeddings.
Ranked #1 on
Node Classification
on USA Air-Traffic
no code implementations • 20 Mar 2016 • Zhilin Yang, Ruslan Salakhutdinov, William Cohen
We present a deep hierarchical recurrent neural network for sequence tagging.
no code implementations • 4 Aug 2015 • Zhilin Yang, Jie Tang, William Cohen
GenVector leverages large-scale unlabeled data with embeddings and represents data of two modalities---i. e., social network users and knowledge concepts---in a shared latent topic space.