1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.
1 code implementation • ACL 2022 • Shuohang Wang, Yichong Xu, Yuwei Fang, Yang Liu, Siqi Sun, Ruochen Xu, Chenguang Zhu, Michael Zeng
Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks.
no code implementations • 10 Feb 2022 • Yulong Chen, Yang Liu, Li Dong, Shuohang Wang, Chenguang Zhu, Michael Zeng, Yue Zhang
However, for prompt learning, there are still two salient gaps between NLP tasks and pretraining.
1 code implementation • CVPR 2022 • Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang
Vision-language (V+L) pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text.
1 code implementation • 8 Dec 2021 • Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang
Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?
2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang
In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.
Ranked #1 on
Common Sense Reasoning
on CommonsenseQA
(using extra training data)
no code implementations • 4 Nov 2021 • Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li
In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
Ranked #1 on
Adversarial Robustness
on AdvGLUE
1 code implementation • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng
Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.
no code implementations • Findings (ACL) 2022 • Yuwei Fang, Shuohang Wang, Yichong Xu, Ruochen Xu, Siqi Sun, Chenguang Zhu, Michael Zeng
Then we utilize a diverse of 4 English knowledge sources to provide more comprehensive coverage of knowledge in different formats.
no code implementations • Findings (ACL) 2022 • Wenhao Yu, Chenguang Zhu, Yuwei Fang, Donghan Yu, Shuohang Wang, Yichong Xu, Michael Zeng, Meng Jiang
In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary.
no code implementations • ACL 2022 • Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, Michael Zeng
The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module.
1 code implementation • Findings (EMNLP) 2021 • Qiyuan Zhang, Lei Wang, Sicheng Yu, Shuohang Wang, Yang Wang, Jing Jiang, Ee-Peng Lim
While diverse question answering (QA) datasets have been proposed and contributed significantly to the development of deep learning models for QA tasks, the existing datasets fall short in two aspects.
1 code implementation • Findings (EMNLP) 2021 • Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng
Data annotation is a time-consuming and labor-intensive process for many NLP tasks.
no code implementations • ACL 2021 • Aston Zhang, Alvin Chan, Yi Tay, Jie Fu, Shuohang Wang, Shuai Zhang, Huajie Shao, Shuochao Yao, Roy Ka-Wei Lee
Orthogonality constraints encourage matrices to be orthogonal for numerical stability.
no code implementations • 23 Apr 2021 • Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu
However, we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy.
no code implementations • CVPR 2021 • Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu, Jingjing Liu
Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language.
1 code implementation • NeurIPS 2021 • Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang
Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.
1 code implementation • NAACL 2021 • Siqi Sun, Yen-Chun Chen, Linjie Li, Shuohang Wang, Yuwei Fang, Jingjing Liu
Multimodal pre-training has propelled great advancement in vision-and-language research.
no code implementations • Findings (ACL) 2021 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu
Transformer has become ubiquitous in the deep learning field.
no code implementations • 1 Jan 2021 • Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu
By incorporating different feature maps after the masking, we can distill better features to help model generalization.
1 code implementation • ACL 2021 • Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu
Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.
1 code implementation • 12 Oct 2020 • Sicheng Yu, Yulei Niu, Shuohang Wang, Jing Jiang, Qianru Sun
We then conduct two novel CVC inference methods (on trained models) to capture the effect of comprehensive reasoning as the final prediction.
1 code implementation • EMNLP 2020 • Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu
In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.
no code implementations • EMNLP 2020 • Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu
Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.
2 code implementations • ICLR 2021 • Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Ranked #1 on
Natural Language Inference
on ANLI test
(using extra training data)
1 code implementation • EMNLP 2020 • Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu
Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.
no code implementations • 13 Sep 2020 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu
Transformer has become ubiquitous in the deep learning field.
Ranked #1 on
Question Answering
on Quasart-T
1 code implementation • 10 Sep 2020 • Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
Ranked #15 on
Zero-Shot Cross-Lingual Transfer
on XTREME
no code implementations • 10 Sep 2020 • Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu, Chenguang Zhu
Although deep neural networks have achieved tremendous success for question answering (QA), they are still suffering from heavy computational and energy cost for real product deployment.
no code implementations • 20 Jan 2020 • Shuohang Wang, Yunshi Lan, Yi Tay, Jing Jiang, Jingjing Liu
Transformer has been successfully applied to many natural language processing tasks.
3 code implementations • EMNLP 2020 • Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li
In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.
no code implementations • NeurIPS 2019 • Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui
Attentional models are distinctly characterized by their ability to learn relative importance, i. e., assigning a different weight to input values.
1 code implementation • EMNLP 2020 • Yuwei Fang, Siqi Sun, Zhe Gan, Rohit Pillai, Shuohang Wang, Jingjing Liu
In this paper, we present Hierarchical Graph Network (HGN) for multi-hop question answering.
Ranked #33 on
Question Answering
on HotpotQA
no code implementations • 28 Oct 2019 • Chenglei Si, Shuohang Wang, Min-Yen Kan, Jing Jiang
Based on our experiments on the 5 key MCRC datasets - RACE, MCTest, MCScript, MCScript2. 0, DREAM - we observe that 1) fine-tuned BERT mainly learns how keywords lead to correct prediction, instead of learning semantic understanding and reasoning; and 2) BERT does not need correct syntactic information to solve the task; 3) there exists artifacts in these datasets such that they can be solved even without the full context.
1 code implementation • ACL 2019 • Yi Tay, Aston Zhang, Luu Anh Tuan, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, Siu Cheung Hui
Many state-of-the-art neural models for NLP are heavily parameterized and thus memory inefficient.
no code implementations • ACL 2019 • Yi Tay, Shuohang Wang, Luu Anh Tuan, Jie Fu, Minh C. Phan, Xingdi Yuan, Jinfeng Rao, Siu Cheung Hui, Aston Zhang
This paper tackles the problem of reading comprehension over long narratives where documents easily span over thousands of tokens.
no code implementations • NAACL 2019 • Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang
Commonsense reasoning is fundamental to natural language understanding.
Ranked #3 on
Natural Language Understanding
on PDP60
1 code implementation • ACL 2018 • Shuohang Wang, Mo Yu, Shiyu Chang, Jing Jiang
Multi-choice reading comprehension is a challenging task, which involves the matching between a passage and a question-answer pair.
1 code implementation • ICLR 2018 • Shuohang Wang, Mo Yu, Jing Jiang, Wei zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell
We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer.
Ranked #1 on
Open-Domain Question Answering
on Quasar
1 code implementation • 31 Aug 2017 • Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei zhang, Shiyu Chang, Gerald Tesauro, Bo-Wen Zhou, Jing Jiang
Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning.
Ranked #4 on
Open-Domain Question Answering
on Quasar
2 code implementations • 6 Nov 2016 • Shuohang Wang, Jing Jiang
We particularly focus on the different comparison functions we can use to match two vectors.
5 code implementations • 29 Aug 2016 • Shuohang Wang, Jing Jiang
We propose two ways of using Pointer Net for our task.
Ranked #48 on
Question Answering
on SQuAD1.1 dev
4 code implementations • NAACL 2016 • Shuohang Wang, Jing Jiang
On the SNLI corpus, our model achieves an accuracy of 86. 1%, outperforming the state of the art.
Ranked #58 on
Natural Language Inference
on SNLI