Search Results for author: Xiaodan Song

Found 15 papers, 7 papers with code

Token Dropping for Efficient BERT Pretraining

no code implementations • ACL 2022 • Le Hou, Richard Yuanzhe Pang, Tianyi Zhou, Yuexin Wu, Xinying Song, Xiaodan Song, Denny Zhou

Transformer-based models generally allocate the same amount of computation for each token in a given sequence.

Language Modelling Masked Language Modeling

Paper
Add Code

Auto-scaling Vision Transformers without Training

1 code implementation • ICLR 2022 • Wuyang Chen, Wei Huang, Xianzhi Du, Xiaodan Song, Zhangyang Wang, Denny Zhou

The motivation comes from two pain spots: 1) the lack of efficient and principled methods for designing and scaling ViTs; 2) the tremendous computational cost of training ViT that is much heavier than its convolution counterpart.

Paper
Code

A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation

3 code implementations • 17 Dec 2021 • Wuyang Chen, Xianzhi Du, Fan Yang, Lucas Beyer, Xiaohua Zhai, Tsung-Yi Lin, Huizhong Chen, Jing Li, Xiaodan Song, Zhangyang Wang, Denny Zhou

In this paper, we comprehensively study three architecture design choices on ViT -- spatial reduction, doubled channels, and multiscale features -- and demonstrate that a vanilla ViT architecture can fulfill this goal without handcrafting multiscale features, maintaining the original ViT design philosophy.

Image Classification Instance Segmentation +6

76,628

Paper
Code

Speeding up Deep Model Training by Sharing Weights and Then Unsharing

no code implementations • 8 Oct 2021 • Shuo Yang, Le Hou, Xiaodan Song, Qiang Liu, Denny Zhou

Our approach exploits the special structure of BERT that contains a stack of repeated modules (i. e., transformer encoders).

Paper
Add Code

A new communication paradigm: from bit accuracy to semantic fidelity

no code implementations • 29 Jan 2021 • Guangming Shi, Dahua Gao, Xiaodan Song, Jingxuan Chai, Minxi Yang, Xuemei Xie, Leida Li, Xuyang Li

In this article, we deploy semantics to solve the spectrum and power bottleneck and propose a first understanding and then transmission framework with high semantic fidelity.

Networking and Internet Architecture

Paper
Add Code

Speeding up Deep Learning Training by Sharing Weights and Then Unsharing

no code implementations • 1 Jan 2021 • Shuo Yang, Le Hou, Xiaodan Song, Qiang Liu, Denny Zhou

It has been widely observed that increasing deep learning model sizes often leads to significant performance improvements on a variety of natural language processing and computer vision tasks.

Paper
Add Code

Efficient Scale-Permuted Backbone with Learned Resource Distribution

no code implementations • ECCV 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Yin Cui, Mingxing Tan, Quoc Le, Xiaodan Song

Furthermore, SpineNet is built with a uniform resource distribution over operations.

General Classification Image Classification +3

Paper
Add Code

Go Wide, Then Narrow: Efficient Training of Deep Thin Networks

no code implementations • ICML 2020 • Denny Zhou, Mao Ye, Chen Chen, Tianjian Meng, Mingxing Tan, Xiaodan Song, Quoc Le, Qiang Liu, Dale Schuurmans

This is achieved by layerwise imitation, that is, forcing the thin network to mimic the intermediate outputs of the wide network from layer to layer.

Computational Efficiency Model Compression

Paper
Add Code

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

5 code implementations • ACL 2020 • Zhiqing Sun, Hongkun Yu, Xiaodan Song, Renjie Liu, Yiming Yang, Denny Zhou

Then, we conduct knowledge transfer from this teacher to MobileBERT.

Ranked #20 on Semantic Textual Similarity on MRPC

Natural Language Inference Question Answering +2

71,989

Paper
Code

BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models

1 code implementation • ECCV 2020 • Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Ruoming Pang, Quoc Le

Without extra retraining or post-processing steps, we are able to train a single set of shared weights on ImageNet and use these weights to obtain child models whose sizes range from 200 to 1000 MFLOPs.

Ranked #30 on Neural Architecture Search on ImageNet

Neural Architecture Search

Paper
Code

SpineNet: Learning Scale-Permuted Backbone for Recognition and Localization

13 code implementations • CVPR 2020 • Xianzhi Du, Tsung-Yi Lin, Pengchong Jin, Golnaz Ghiasi, Mingxing Tan, Yin Cui, Quoc V. Le, Xiaodan Song

We propose SpineNet, a backbone with scale-permuted intermediate features and cross-scale connections that is learned on an object detection task by Neural Architecture Search.

Ranked #9 on Image Classification on iNaturalist

Decoder General Classification +6

73,120

Paper
Code

Scaling Up Neural Architecture Search with Big Single-Stage Models

no code implementations • 25 Sep 2019 • Jiahui Yu, Pengchong Jin, Hanxiao Liu, Gabriel Bender, Pieter-Jan Kindermans, Mingxing Tan, Thomas Huang, Xiaodan Song, Quoc Le

In this work, we propose BigNAS, an approach that simplifies this workflow and scales up neural architecture search to target a wide range of model sizes simultaneously.

Neural Architecture Search

Paper
Add Code

High Resolution Medical Image Analysis with Spatial Partitioning

1 code implementation • 6 Sep 2019 • Le Hou, Youlong Cheng, Noam Shazeer, Niki Parmar, Yeqing Li, Panagiotis Korfiatis, Travis M. Drucker, Daniel J. Blezek, Xiaodan Song

It is infeasible to train CNN models directly on such high resolution images, because neural activations of a single image do not fit in the memory of a single GPU/TPU, and naive data and model parallelism approaches do not work.

Vocal Bursts Intensity Prediction

1,557

Paper
Code