Search Results for author: Sheng Shen

Found 26 papers, 18 papers with code

What’s Hidden in a One-layer Randomly Weighted Transformer?

1 code implementation EMNLP 2021 Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney

Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.

Machine Translation Translation

Train Big, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

no code implementations ICML 2020 Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

K-LITE: Learning Transferable Visual Models with External Knowledge

no code implementations20 Apr 2022 Sheng Shen, Chunyuan Li, Xiaowei Hu, Yujia Xie, Jianwei Yang, Pengchuan Zhang, Anna Rohrbach, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Jianfeng Gao

In this paper, we propose K-LITE (Knowledge-augmented Language-Image Training and Evaluation), a simple strategy to leverage external knowledge to build transferable visual systems: In training, it enriches entities in natural language with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that can understand both visual concepts and their knowledge; In evaluation, the natural language is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models.

Image Classification object-detection +2

One Parameter Defense -- Defending against Data Inference Attacks via Differential Privacy

no code implementations13 Mar 2022 Dayong Ye, Sheng Shen, Tianqing Zhu, Bo Liu, Wanlei Zhou

The experimental results show the method to be an effective and timely defense against both membership inference and model inversion attacks with no reduction in accuracy.

Staged Training for Transformer Language Models

1 code implementation11 Mar 2022 Sheng Shen, Pete Walsh, Kurt Keutzer, Jesse Dodge, Matthew Peters, Iz Beltagy

As an alternative, we consider a staged training setup that begins with a small model and incrementally increases the amount of compute used for training by applying a "growth operator" to increase the model depth and width.

Implicit Transformer Network for Screen Content Image Continuous Super-Resolution

1 code implementation NeurIPS 2021 Jingyu Yang, Sheng Shen, Huanjing Yue, Kun Li

Nowadays, there is an explosive growth of screen contents due to the wide application of screen sharing, remote cooperation, and online education.

Super-Resolution

What's Hidden in a One-layer Randomly Weighted Transformer?

1 code implementation8 Sep 2021 Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael W. Mahoney

Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.

Machine Translation Translation

How Much Can CLIP Benefit Vision-and-Language Tasks?

2 code implementations13 Jul 2021 Sheng Shen, Liunian Harold Li, Hao Tan, Mohit Bansal, Anna Rohrbach, Kai-Wei Chang, Zhewei Yao, Kurt Keutzer

Most existing Vision-and-Language (V&L) models rely on pre-trained visual encoders, using a relatively small set of manually-annotated data (as compared to web-crawled data), to perceive the visual world.

Ranked #5 on Visual Entailment on SNLI-VE val (using extra training data)

Question Answering Visual Entailment +1

Learned Token Pruning for Transformers

1 code implementation2 Jul 2021 Sehoon Kim, Sheng Shen, David Thorsley, Amir Gholami, Woosuk Kwon, Joseph Hassoun, Kurt Keutzer

We extensively test the performance of LTP on GLUE tasks and show that our method outperforms the prior state-of-the-art token pruning methods by up to ~2. 5% higher accuracy with the same amount of FLOPs.

LEAP: Learnable Pruning for Transformer-based Models

1 code implementation30 May 2021 Zhewei Yao, Xiaoxia Wu, Linjian Ma, Sheng Shen, Kurt Keutzer, Michael W. Mahoney, Yuxiong He

Moreover, in order to reduce hyperparameter tuning, a novel adaptive regularization coefficient is deployed to control the regularization penalty adaptively.

Natural Language Processing

Reservoir Transformers

no code implementations ACL 2021 Sheng Shen, Alexei Baevski, Ari S. Morcos, Kurt Keutzer, Michael Auli, Douwe Kiela

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated.

Language Modelling Machine Translation +1

From Distributed Machine Learning To Federated Learning: In The View Of Data Privacy And Security

no code implementations19 Oct 2020 Sheng Shen, Tianqing Zhu, Di wu, Wei Wang, Wanlei Zhou

Federated learning is an improved version of distributed machine learning that further offloads operations which would usually be performed by a central server.

Distributed, Parallel, and Cluster Computing

Noisy Self-Knowledge Distillation for Text Summarization

1 code implementation NAACL 2021 Yang Liu, Sheng Shen, Mirella Lapata

In this paper we apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training on single reference and noisy datasets.

Self-Knowledge Distillation Text Summarization

Differentially Private Multi-Agent Planning for Logistic-like Problems

no code implementations16 Aug 2020 Dayong Ye, Tianqing Zhu, Sheng Shen, Wanlei Zhou, Philip S. Yu

To the best of our knowledge, this paper is the first to apply differential privacy to the field of multi-agent planning as a means of preserving the privacy of agents for logistic-like problems.

Privacy Preserving

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

4 code implementations1 Jun 2020 Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney

We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN.

Second-order methods Stochastic Optimization

PowerNorm: Rethinking Batch Normalization in Transformers

1 code implementation ICML 2020 Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

To address this, we propose Power Normalization (PN), a novel normalization scheme that resolves this issue by (i) relaxing zero-mean normalization in BN, (ii) incorporating a running quadratic mean instead of per batch statistics to stabilize fluctuations, and (iii) using an approximate backpropagation for incorporating the running statistics in the forward pass.

Natural Language Processing

Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers

2 code implementations26 Feb 2020 Zhuohan Li, Eric Wallace, Sheng Shen, Kevin Lin, Kurt Keutzer, Dan Klein, Joseph E. Gonzalez

Since hardware resources are limited, the objective of training deep learning models is typically to maximize accuracy subject to the time and memory constraints of training and inference.

Machine Translation Quantization +1

Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT

no code implementations12 Sep 2019 Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer

In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further.

Natural Language Processing Quantization

An annotated dataset of literary entities

2 code implementations NAACL 2019 David Bamman, Sejal Popat, Sheng Shen

We present a new dataset comprised of 210, 532 tokens evenly drawn from 100 different English-language literary texts annotated for ACE entity categories (person, location, geo-political entity, facility, organization, and vehicle).

Emoji-Powered Representation Learning for Cross-Lingual Sentiment Classification

1 code implementation7 Jun 2018 Zhenpeng Chen, Sheng Shen, Ziniu Hu, Xuan Lu, Qiaozhu Mei, Xuanzhe Liu

To tackle this problem, cross-lingual sentiment classification approaches aim to transfer knowledge learned from one language that has abundant labeled examples (i. e., the source language, usually English) to another language with fewer labels (i. e., the target language).

Classification Cross-Lingual Sentiment Classification +4

Cannot find the paper you are looking for? You can Submit a new open access paper.