1 code implementation • EMNLP 2021 • Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney
Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.
no code implementations • ICML 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.
no code implementations • 20 Apr 2022 • Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Anna Rohrbach, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Jianfeng Gao
In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge to build transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that can understand both visual concepts and their knowledge; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models.
no code implementations • 13 Mar 2022 • Dayong Ye, Sheng Shen, Tianqing Zhu, Bo Liu, Wanlei Zhou
The experimental results show the method to be an effective and timely defense against both membership inference and model inversion attacks with no reduction in accuracy.
1 code implementation • 11 Mar 2022 • Sheng Shen, Pete Walsh, Kurt Keutzer, Jesse Dodge, Matthew Peters, Iz Beltagy
As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width.
1 code implementation • NeurIPS 2021 • Jingyu Yang, Sheng Shen, Huanjing Yue, Kun Li
Nowadays, there is an explosive growth of screen contents due to the wide application of screen sharing, remote cooperation, and online education.
1 code implementation • 27 Oct 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao
Permutations then serve as target generation orders for training an insertion-based Transformer language model.
3 code implementations • ICLR 2022 • Victor Sanh, Albert Webson, Colin Raffel, Stephen H. Bach, Lintang Sutawika, Zaid Alyafeai, Antoine Chaffin, Arnaud Stiegler, Teven Le Scao, Arun Raja, Manan Dey, M Saiful Bari, Canwen Xu, Urmish Thakker, Shanya Sharma Sharma, Eliza Szczechla, Taewoon Kim, Gunjan Chhablani, Nihal Nayak, Debajyoti Datta, Jonathan Chang, Mike Tian-Jian Jiang, Han Wang, Matteo Manica, Sheng Shen, Zheng Xin Yong, Harshit Pandey, Rachel Bawden, Thomas Wang, Trishala Neeraj, Jos Rozen, Abheesht Sharma, Andrea Santilli, Thibault Fevry, Jason Alan Fries, Ryan Teehan, Tali Bers, Stella Biderman, Leo Gao, Thomas Wolf, Alexander M. Rush
Large language models have recently been shown to attain reasonable zero-shot generalization on a diverse set of tasks (Brown et al., 2020).
1 code implementation • 8 Sep 2021 • Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael W. Mahoney
Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.
2 code implementations • 13 Jul 2021 • Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer
Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.
Ranked #5 on
Visual Entailment
on SNLI-VE val
(using extra training data)
1 code implementation • 2 Jul 2021 • Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer
We extensively test the performance of LTP on GLUE tasks and show that our method outperforms the prior state-of-the-art token pruning methods by up to ~2. 5% higher accuracy with the same amount of FLOPs.
1 code implementation • 30 May 2021 • Zhewei Yao, Xiaoxia Wu, Linjian Ma, Sheng Shen, Kurt Keutzer, Michael W. Mahoney, Yuxiong He
Moreover, in order to reduce hyperparameter tuning, a novel adaptive regularization coefficient is deployed to control the regularization penalty adaptively.
1 code implementation • ICLR 2021 • Xuanlin Li, Brandon Trabucco, Dong Huk Park, Michael Luo, Sheng Shen, Trevor Darrell, Yang Gao
One strategy to recover this information is to decode both the content and location of tokens.
no code implementations • ACL 2021 • Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela
We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.
no code implementations • 19 Oct 2020 • Sheng Shen, Tianqing Zhu, Di wu, Wei Wang, Wanlei Zhou
Federated learning is an improved version of distributed machine learning that further offloads operations which would usually be performed by a central server.
Distributed, Parallel, and Cluster Computing
1 code implementation • EMNLP 2020 • Qinxin Wang, Hao Tan, Sheng Shen, Michael W. Mahoney, Zhewei Yao
Phrase localization is a task that studies the mapping from textual phrases to regions of an image.
1 code implementation • NAACL 2021 • Yang Liu, Sheng Shen, Mirella Lapata
In this paper we apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training on single reference and noisy datasets.
no code implementations • 16 Aug 2020 • Dayong Ye, Tianqing Zhu, Sheng Shen, Wanlei Zhou, Philip S. Yu
To the best of our knowledge, this paper is the first to apply differential privacy to the field of multi-agent planning as a means of preserving the privacy of agents for logistic-like problems.
4 code implementations • 1 Jun 2020 • Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney
We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN.
1 code implementation • ICML 2020 • Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
To address this, we propose Power Normalization (PN), a novel normalization scheme that resolves this issue by (i) relaxing zero-mean normalization in BN, (ii) incorporating a running quadratic mean instead of per batch statistics to stabilize fluctuations, and (iii) using an approximate backpropagation for incorporating the running statistics in the forward pass.
2 code implementations • 26 Feb 2020 • Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez
Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.
no code implementations • 12 Sep 2019 • Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further.
2 code implementations • NAACL 2019 • David Bamman, Sejal Popat, Sheng Shen
We present a new dataset comprised of 210, 532 tokens evenly drawn from 100 different English-language literary texts annotated for ACE entity categories (person, location, geo-political entity, facility, organization, and vehicle).
2 code implementations • NAACL 2019 • Sheng Shen, Daniel Fried, Jacob Andreas, Dan Klein
We improve the informativeness of models for conditional text generation using techniques from computational pragmatics.
Ranked #1 on
Data-to-Text Generation
on E2E NLG Challenge
Abstractive Text Summarization
Conditional Text Generation
+3
no code implementations • 1 Nov 2018 • Sheng Shen, Yaliang Li, Nan Du, Xian Wu, Yusheng Xie, Shen Ge, Tao Yang, Kai Wang, Xingzheng Liang, Wei Fan
Question answering (QA) has achieved promising progress recently.
1 code implementation • 7 Jun 2018 • Zhenpeng Chen, Sheng Shen, Ziniu Hu, Xuan Lu, Qiaozhu Mei, Xuanzhe Liu
To tackle this problem, cross-lingual sentiment classification approaches aim to transfer knowledge learned from one language that has abundant labeled examples (i. e., the source language, usually English) to another language with fewer labels (i. e., the target language).