1 code implementation • 19 Jun 2023 • Liliang Ren, Yang Liu, Shuohang Wang, Yichong Xu, Chenguang Zhu, ChengXiang Zhai
Linear State Space Models (SSMs) have demonstrated strong performance in a variety of sequence modeling tasks due to their efficient encoding of the recurrent structure.
Ranked #2 on
Long-range modeling
on LRA
no code implementations • 24 May 2023 • Dan Iter, Reid Pryzant, Ruochen Xu, Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu
Our method is based on the observation that the effectiveness of in-context demonstrations negatively correlates with the perplexity of the test example by a language model that was finetuned on that demonstration.
1 code implementation • 23 May 2023 • Simeng Sun, Yang Liu, Shuohang Wang, Chenguang Zhu, Mohit Iyyer
PEARL outperforms zero-shot and chain-of-thought prompting on this dataset, and ablation experiments show that each stage of PEARL is critical to its performance.
no code implementations • 22 May 2023 • Yichong Xu, Ruochen Xu, Dan Iter, Yang Liu, Shuohang Wang, Chenguang Zhu, Michael Zeng
While large models such as GPT-3 demonstrate exceptional performance in zeroshot and fewshot summarization tasks, their extensive serving and fine-tuning costs hinder their utilization in various applications.
no code implementations • 22 May 2023 • Ruochen Xu, Song Wang, Yang Liu, Shuohang Wang, Yichong Xu, Dan Iter, Chenguang Zhu, Michael Zeng
We hypothesize that there is a hidden query for each summary sentence in a generic summarization annotation, and we utilize a large-scale pretrained language model to recover it.
1 code implementation • 15 May 2023 • Canwen Xu, Yichong Xu, Shuohang Wang, Yang Liu, Chenguang Zhu, Julian McAuley
Large language models (LLMs) such as GPT-3 and GPT-4 are powerful but their weights are often publicly unavailable and their immense sizes make the models difficult to be tuned with common hardware.
1 code implementation • 29 Mar 2023 • Yang Liu, Dan Iter, Yichong Xu, Shuohang Wang, Ruochen Xu, Chenguang Zhu
In this work, we present G-Eval, a framework of using large language models with chain-of-thoughts (CoT) and a form-filling paradigm, to assess the quality of NLG outputs.
no code implementations • 19 Dec 2022 • Soumya Sanyal, Yichong Xu, Shuohang Wang, ZiYi Yang, Reid Pryzant, Wenhao Yu, Chenguang Zhu, Xiang Ren
Logical reasoning of text is an important ability that requires understanding the information present in the text, their interconnections, and then reasoning through them to infer new conclusions.
no code implementations • 15 Nov 2022 • Ziniu Hu, Yichong Xu, Wenhao Yu, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Kai-Wei Chang, Yizhou Sun
Answering open-domain questions requires world knowledge about in-context entities.
1 code implementation • 23 Oct 2022 • Wenhao Yu, Chenguang Zhu, Zhihan Zhang, Shuohang Wang, Zhuosheng Zhang, Yuwei Fang, Meng Jiang
However, applying such methods to commonsense reasoning tasks faces two unique challenges, i. e., the lack of a general large-scale corpus for retrieval and a corresponding effective commonsense retriever.
1 code implementation • 17 Oct 2022 • Chenglei Si, Zhe Gan, Zhengyuan Yang, Shuohang Wang, JianFeng Wang, Jordan Boyd-Graber, Lijuan Wang
While reliability is a broad and vaguely defined term, we decompose reliability into four main facets that correspond to the existing framework of ML safety and are well-recognized to be important: generalizability, social biases, calibration, and factuality.
1 code implementation • 12 Oct 2022 • Zhuosheng Zhang, Shuohang Wang, Yichong Xu, Yuwei Fang, Wenhao Yu, Yang Liu, Hai Zhao, Chenguang Zhu, Michael Zeng
Leveraging task-aware annotated data as supervised signals to assist with self-supervised learning on large-scale unlabeled data has become a new trend in pre-training language models.
1 code implementation • 21 Sep 2022 • Wenhao Yu, Dan Iter, Shuohang Wang, Yichong Xu, Mingxuan Ju, Soumya Sanyal, Chenguang Zhu, Michael Zeng, Meng Jiang
We call our method generate-then-read (GenRead), which first prompts a large language model to generate contextutal documents based on a given question, and then reads the generated documents to produce the final answer.
1 code implementation • 22 May 2022 • Zhenhailong Wang, Manling Li, Ruochen Xu, Luowei Zhou, Jie Lei, Xudong Lin, Shuohang Wang, ZiYi Yang, Chenguang Zhu, Derek Hoiem, Shih-Fu Chang, Mohit Bansal, Heng Ji
The goal of this work is to build flexible video-language models that can generalize to various video-to-text tasks from few examples, such as domain-specific captioning, question answering, and future event prediction.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • ACL 2022 • Shuohang Wang, Yichong Xu, Yuwei Fang, Yang Liu, Siqi Sun, Ruochen Xu, Chenguang Zhu, Michael Zeng
Surprisingly, we found that REtrieving from the traINing datA (REINA) only can lead to significant gains on multiple NLG and NLU tasks.
no code implementations • 10 Feb 2022 • Yulong Chen, Yang Liu, Li Dong, Shuohang Wang, Chenguang Zhu, Michael Zeng, Yue Zhang
However, for prompt learning, there are still two salient gaps between NLP tasks and pretraining.
1 code implementation • CVPR 2022 • Manling Li, Ruochen Xu, Shuohang Wang, Luowei Zhou, Xudong Lin, Chenguang Zhu, Michael Zeng, Heng Ji, Shih-Fu Chang
Vision-language (V+L) pretraining models have achieved great success in supporting multimedia applications by understanding the alignments between images and text.
1 code implementation • 8 Dec 2021 • Yixin Nie, Linjie Li, Zhe Gan, Shuohang Wang, Chenguang Zhu, Michael Zeng, Zicheng Liu, Mohit Bansal, Lijuan Wang
Based on this, we ask an even bolder question: can we have an all-MLP architecture for VL modeling, where both VL fusion and the vision encoder are replaced with MLPs?
2 code implementations • 6 Dec 2021 • Yichong Xu, Chenguang Zhu, Shuohang Wang, Siqi Sun, Hao Cheng, Xiaodong Liu, Jianfeng Gao, Pengcheng He, Michael Zeng, Xuedong Huang
In particular, we focus on the task of Commonsense Reasoning, demonstrating that the proposed external attention mechanism can augment existing transformer models and significantly improve the model's reasoning capabilities.
Ranked #1 on
Common Sense Reasoning
on CommonsenseQA
(using extra training data)
no code implementations • 4 Nov 2021 • Boxin Wang, Chejian Xu, Shuohang Wang, Zhe Gan, Yu Cheng, Jianfeng Gao, Ahmed Hassan Awadallah, Bo Li
In this paper, we present Adversarial GLUE (AdvGLUE), a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks.
Ranked #1 on
Adversarial Robustness
on AdvGLUE
2 code implementations • CVPR 2022 • Zi-Yi Dou, Yichong Xu, Zhe Gan, JianFeng Wang, Shuohang Wang, Lijuan Wang, Chenguang Zhu, Pengchuan Zhang, Lu Yuan, Nanyun Peng, Zicheng Liu, Michael Zeng
Vision-and-language (VL) pre-training has proven to be highly effective on various VL downstream tasks.
Ranked #18 on
Cross-Modal Retrieval
on COCO 2014
no code implementations • Findings (ACL) 2022 • Yuwei Fang, Shuohang Wang, Yichong Xu, Ruochen Xu, Siqi Sun, Chenguang Zhu, Michael Zeng
Then we utilize a diverse of 4 English knowledge sources to provide more comprehensive coverage of knowledge in different formats.
1 code implementation • Findings (ACL) 2022 • Wenhao Yu, Chenguang Zhu, Yuwei Fang, Donghan Yu, Shuohang Wang, Yichong Xu, Michael Zeng, Meng Jiang
In addition to training with the masked language modeling objective, we propose two novel self-supervised pre-training tasks on word and sentence-level alignment between input text sequence and rare word definitions to enhance language modeling representation with dictionary.
no code implementations • ACL 2022 • Donghan Yu, Chenguang Zhu, Yuwei Fang, Wenhao Yu, Shuohang Wang, Yichong Xu, Xiang Ren, Yiming Yang, Michael Zeng
The recent proposed Fusion-in-Decoder (FiD), which is built on top of the pretrained generative model T5, achieves the state-of-the-art performance in the reading module.
1 code implementation • Findings (EMNLP) 2021 • Qiyuan Zhang, Lei Wang, Sicheng Yu, Shuohang Wang, Yang Wang, Jing Jiang, Ee-Peng Lim
While diverse question answering (QA) datasets have been proposed and contributed significantly to the development of deep learning models for QA tasks, the existing datasets fall short in two aspects.
1 code implementation • Findings (EMNLP) 2021 • Shuohang Wang, Yang Liu, Yichong Xu, Chenguang Zhu, Michael Zeng
Data annotation is a time-consuming and labor-intensive process for many NLP tasks.
no code implementations • ACL 2021 • Aston Zhang, Alvin Chan, Yi Tay, Jie Fu, Shuohang Wang, Shuai Zhang, Huajie Shao, Shuochao Yao, Roy Ka-Wei Lee
Orthogonality constraints encourage matrices to be orthogonal for numerical stability.
no code implementations • 23 Apr 2021 • Zhe Gan, Yen-Chun Chen, Linjie Li, Tianlong Chen, Yu Cheng, Shuohang Wang, Jingjing Liu, Lijuan Wang, Zicheng Liu
However, we can find "relaxed" winning tickets at 50%-70% sparsity that maintain 99% of the full accuracy.
no code implementations • CVPR 2021 • Mingyang Zhou, Luowei Zhou, Shuohang Wang, Yu Cheng, Linjie Li, Zhou Yu, Jingjing Liu
Vision-and-language pre-training has achieved impressive success in learning multimodal representations between vision and language.
1 code implementation • NeurIPS 2021 • Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Jingjing Liu, Zhangyang Wang
Based on these results, we articulate the Elastic Lottery Ticket Hypothesis (E-LTH): by mindfully replicating (or dropping) and re-ordering layers for one network, its corresponding winning ticket could be stretched (or squeezed) into a subnetwork for another deeper (or shallower) network from the same family, whose performance is nearly the same competitive as the latter's winning ticket directly found by IMP.
2 code implementations • NAACL 2021 • Siqi Sun, Yen-Chun Chen, Linjie Li, Shuohang Wang, Yuwei Fang, Jingjing Liu
Multimodal pre-training has propelled great advancement in vision-and-language research.
no code implementations • 1 Jan 2021 • Minhao Cheng, Zhe Gan, Yu Cheng, Shuohang Wang, Cho-Jui Hsieh, Jingjing Liu
By incorporating different feature maps after the masking, we can distill better features to help model generalization.
no code implementations • Findings (ACL) 2021 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu
Transformer has become ubiquitous in the deep learning field.
1 code implementation • ACL 2021 • Xiaohan Chen, Yu Cheng, Shuohang Wang, Zhe Gan, Zhangyang Wang, Jingjing Liu
Heavily overparameterized language models such as BERT, XLNet and T5 have achieved impressive success in many NLP tasks.
1 code implementation • 12 Oct 2020 • Sicheng Yu, Yulei Niu, Shuohang Wang, Jing Jiang, Qianru Sun
We then conduct two novel CVC inference methods (on trained models) to capture the effect of comprehensive reasoning as the final prediction.
1 code implementation • EMNLP 2020 • Shuohang Wang, Yuwei Fang, Siqi Sun, Zhe Gan, Yu Cheng, Jing Jiang, Jingjing Liu
In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering.
no code implementations • EMNLP 2020 • Yue Dong, Shuohang Wang, Zhe Gan, Yu Cheng, Jackie Chi Kit Cheung, Jingjing Liu
Pre-trained neural abstractive summarization systems have dominated extractive strategies on news summarization performance, at least in terms of ROUGE.
2 code implementations • ICLR 2021 • Boxin Wang, Shuohang Wang, Yu Cheng, Zhe Gan, Ruoxi Jia, Bo Li, Jingjing Liu
Large-scale language models such as BERT have achieved state-of-the-art performance across a wide range of NLP tasks.
Ranked #1 on
Natural Language Inference
on ANLI test
(using extra training data)
1 code implementation • EMNLP 2020 • Siqi Sun, Zhe Gan, Yu Cheng, Yuwei Fang, Shuohang Wang, Jingjing Liu
Existing language model compression methods mostly use a simple L2 loss to distill knowledge in the intermediate representations of a large BERT model to a smaller one.
no code implementations • 13 Sep 2020 • Shuohang Wang, Luowei Zhou, Zhe Gan, Yen-Chun Chen, Yuwei Fang, Siqi Sun, Yu Cheng, Jingjing Liu
Transformer has become ubiquitous in the deep learning field.
Ranked #1 on
Open-Domain Question Answering
on SearchQA
no code implementations • 10 Sep 2020 • Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu, Chenguang Zhu
Although deep neural networks have achieved tremendous success for question answering (QA), they are still suffering from heavy computational and energy cost for real product deployment.
1 code implementation • 10 Sep 2020 • Yuwei Fang, Shuohang Wang, Zhe Gan, Siqi Sun, Jingjing Liu
During inference, the model makes predictions based on the text input in the target language and its translation in the source language.
Ranked #18 on
Zero-Shot Cross-Lingual Transfer
on XTREME
no code implementations • 20 Jan 2020 • Shuohang Wang, Yunshi Lan, Yi Tay, Jing Jiang, Jingjing Liu
Transformer has been successfully applied to many natural language processing tasks.
3 code implementations • EMNLP 2020 • Boxin Wang, Hengzhi Pei, Boyuan Pan, Qian Chen, Shuohang Wang, Bo Li
In particular, we propose a tree-based autoencoder to embed the discrete text data into a continuous representation space, upon which we optimize the adversarial perturbation.
no code implementations • NeurIPS 2019 • Yi Tay, Anh Tuan Luu, Aston Zhang, Shuohang Wang, Siu Cheung Hui
Attentional models are distinctly characterized by their ability to learn relative importance, i. e., assigning a different weight to input values.
1 code implementation • EMNLP 2020 • Yuwei Fang, Siqi Sun, Zhe Gan, Rohit Pillai, Shuohang Wang, Jingjing Liu
In this paper, we present Hierarchical Graph Network (HGN) for multi-hop question answering.
Ranked #35 on
Question Answering
on HotpotQA
no code implementations • 28 Oct 2019 • Chenglei Si, Shuohang Wang, Min-Yen Kan, Jing Jiang
Based on our experiments on the 5 key MCRC datasets - RACE, MCTest, MCScript, MCScript2. 0, DREAM - we observe that 1) fine-tuned BERT mainly learns how keywords lead to correct prediction, instead of learning semantic understanding and reasoning; and 2) BERT does not need correct syntactic information to solve the task; 3) there exists artifacts in these datasets such that they can be solved even without the full context.
1 code implementation • ACL 2019 • Yi Tay, Aston Zhang, Luu Anh Tuan, Jinfeng Rao, Shuai Zhang, Shuohang Wang, Jie Fu, Siu Cheung Hui
Many state-of-the-art neural models for NLP are heavily parameterized and thus memory inefficient.
no code implementations • ACL 2019 • Yi Tay, Shuohang Wang, Luu Anh Tuan, Jie Fu, Minh C. Phan, Xingdi Yuan, Jinfeng Rao, Siu Cheung Hui, Aston Zhang
This paper tackles the problem of reading comprehension over long narratives where documents easily span over thousands of tokens.
no code implementations • NAACL 2019 • Shuohang Wang, Sheng Zhang, Yelong Shen, Xiaodong Liu, Jingjing Liu, Jianfeng Gao, Jing Jiang
Commonsense reasoning is fundamental to natural language understanding.
Ranked #3 on
Natural Language Understanding
on PDP60
1 code implementation • ACL 2018 • Shuohang Wang, Mo Yu, Shiyu Chang, Jing Jiang
Multi-choice reading comprehension is a challenging task, which involves the matching between a passage and a question-answer pair.
1 code implementation • ICLR 2018 • Shuohang Wang, Mo Yu, Jing Jiang, Wei zhang, Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, Murray Campbell
We propose two methods, namely, strength-based re-ranking and coverage-based re-ranking, to make use of the aggregated evidence from different passages to better determine the answer.
Ranked #1 on
Open-Domain Question Answering
on Quasar
1 code implementation • 31 Aug 2017 • Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei zhang, Shiyu Chang, Gerald Tesauro, Bo-Wen Zhou, Jing Jiang
Second, we propose a novel method that jointly trains the Ranker along with an answer-generation Reader model, based on reinforcement learning.
Ranked #4 on
Open-Domain Question Answering
on Quasar
2 code implementations • 6 Nov 2016 • Shuohang Wang, Jing Jiang
We particularly focus on the different comparison functions we can use to match two vectors.
5 code implementations • 29 Aug 2016 • Shuohang Wang, Jing Jiang
We propose two ways of using Pointer Net for our task.
Ranked #47 on
Question Answering
on SQuAD1.1 dev
4 code implementations • NAACL 2016 • Shuohang Wang, Jing Jiang
On the SNLI corpus, our model achieves an accuracy of 86. 1%, outperforming the state of the art.
Ranked #61 on
Natural Language Inference
on SNLI