Search Results for author: Song Han

Found 70 papers, 39 papers with code

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference

no code implementations ECCV 2020 Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, Song Han

Third, our solution is extit{efficient} on the edge since the majority of the workload is delegated to the cloud, and our mixing and de-mixing processes introduce very few extra computations.

Speech Recognition

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

1 code implementation3 May 2022 Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, Song Han

Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs.

Multi-Person Pose Estimation

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

no code implementations25 Apr 2022 Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.

Model Compression Neural Architecture Search +2

TorchSparse: Efficient Point Cloud Inference Engine

1 code implementation21 Apr 2022 Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han

TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.

Autonomous Driving

QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning

2 code implementations26 Feb 2022 Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han

Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low fidelity and thus degrading the training accuracy.

Image Classification

AET-SGD: Asynchronous Event-triggered Stochastic Gradient Descent

1 code implementation27 Dec 2021 Nhuong Nguyen, Song Han

In this paper, we propose an Asynchronous Event-triggered Stochastic Gradient Descent (SGD) framework, called AET-SGD, to i) reduce the communication cost among the compute nodes, and ii) mitigate the impact of the delay.

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

no code implementations NeurIPS 2021 Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han

Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data.

Federated Learning

Memory-efficient Patch-based Inference for Tiny Deep Learning

no code implementations NeurIPS 2021 Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +1

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

no code implementations28 Oct 2021 Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +1

QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization

2 code implementations21 Oct 2021 Hanrui Wang, Jiaqi Gu, Yongshan Ding, Zirui Li, Frederic T. Chong, David Z. Pan, Song Han

Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware.

Denoising Quantization

Network Augmentation for Tiny Deep Learning

no code implementations ICLR 2022 Han Cai, Chuang Gan, Ji Lin, Song Han

We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.

Data Augmentation Image Classification +1

RoDesigner: Variation-Aware Optimization for Robust Analog Design with Multi-Task RL

no code implementations29 Sep 2021 Wei Shi, Hanrui Wang, Jiaqi Gu, Mingjie Liu, David Z. Pan, Song Han, Nan Sun

Specifically, circuit optimizations under different variations are considered as a set of tasks.

Towards Efficient On-Chip Training of Quantum Neural Networks

no code implementations29 Sep 2021 Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han

The results demonstrate that our on-chip training achieves over 90% and 60% accuracy for 2-class and 4-class image classification tasks.

Image Classification

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

1 code implementation27 Sep 2021 Ji Lin, Chuang Gan, Kuan Wang, Song Han

Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.

Video Recognition Video Understanding

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

no code implementations ICCV 2021 Zhijian Liu, Simon Stent, Jie Li, John Gideon, Song Han

Computer vision tasks such as object detection and semantic/instance segmentation rely on the painstaking annotation of large training datasets.

Image Classification Instance Segmentation +2

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

2 code implementations22 Jul 2021 Hanrui Wang, Yongshan Ding, Jiaqi Gu, Zirui Li, Yujun Lin, David Z. Pan, Frederic T. Chong, Song Han

Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines.

NAAS: Neural Accelerator Architecture Search

no code implementations27 May 2021 Yujun Lin, Mengtian Yang, Song Han

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity.

Efficient and Robust LiDAR-Based End-to-End Navigation

no code implementations20 May 2021 Zhijian Liu, Alexander Amini, Sibo Zhu, Sertac Karaman, Song Han, Daniela Rus

On the other hand, increasing the robustness of these systems is also critical; however, even estimating the model's uncertainty is very challenging due to the cost of sampling-based methods.

PatchNet -- Short-range Template Matching for Efficient Video Processing

1 code implementation10 Mar 2021 Huizi Mao, Sibo Zhu, Song Han, William J. Dally

Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.

Object Recognition Template Matching +3

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

no code implementations17 Dec 2020 Hanrui Wang, Zhekai Zhang, Song Han

Inspired by the high redundancy of human languages, we propose the novel cascade token pruning to prune away unimportant tokens in the sentence.

Quantization

IOS: Inter-Operator Scheduler for CNN Acceleration

1 code implementation2 Nov 2020 Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han

To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization.

Hardware-Centric AutoML for Mixed-Precision Quantization

no code implementations11 Aug 2020 Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

AutoML Quantization

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

1 code implementation NeurIPS 2020 Han Cai, Chuang Gan, Ligeng Zhu, Song Han

Furthermore, combined with feature extractor adaptation, TinyTL provides 7. 3-12. 9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

Transfer Learning

MCUNet: Tiny Deep Learning on IoT Devices

no code implementations NeurIPS 2020 Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.

Neural Architecture Search

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

1 code implementation CVPR 2020 Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Song Han

However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming.

Quantization

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

4 code implementations ACL 2020 Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han

To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search.

Machine Translation Neural Architecture Search +1

MicroNet for Efficient Language Modeling

1 code implementation16 May 2020 Zhongxia Yan, Hanrui Wang, Demi Guo, Song Han

In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track.

Knowledge Distillation Language Modelling +3

Once for All: Train One Network and Specialize it for Efficient Deployment

1 code implementation ICLR 2020 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.

Neural Architecture Search

Lite Transformer with Long-Short Range Attention

2 code implementations ICLR 2020 Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han

For language modeling, Lite Transformer achieves 1. 8 lower perplexity than the transformer at around 500M MACs.

Abstractive Text Summarization AutoML +5

GAN Compression: Efficient Architectures for Interactive Conditional GANs

1 code implementation CVPR 2020 Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han

Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures.

Image Generation Neural Architecture Search

SpArch: Efficient Architecture for Sparse Matrix Multiplication

no code implementations20 Feb 2020 Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally

We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5. 4x.

Hardware Architecture Distributed, Parallel, and Cluster Computing

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

no code implementations1 Oct 2019 Ji Lin, Chuang Gan, Song Han

With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74. 0%, which is 1. 6x and 2. 9x faster than previous 3D video models with higher accuracy.

Video Recognition

Distributed Training Across the World

no code implementations25 Sep 2019 Ligeng Zhu, Yao Lu, Yujun Lin, Song Han

Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e. g. 25Gb Ethernet or Infini-band).

Once-for-All: Train One Network and Specialize it for Efficient Deployment

8 code implementations26 Aug 2019 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.

Neural Architecture Search

Point-Voxel CNN for Efficient 3D Deep Learning

3 code implementations NeurIPS 2019 Zhijian Liu, Haotian Tang, Yujun Lin, Song Han

The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.

3D Object Detection 3D Semantic Segmentation +1

Deep Leakage from Gradients

5 code implementations NeurIPS 2019 Ligeng Zhu, Zhijian Liu, Song Han

Exchanging gradients is a widely used method in modern multi-node machine learning system (e. g., distributed training, collaborative learning).

Design Automation for Efficient Deep Learning Computing

no code implementations24 Apr 2019 Song Han, Han Cai, Ligeng Zhu, Ji Lin, Kuan Wang, Zhijian Liu, Yujun Lin

Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

Quantization

Defensive Quantization: When Efficiency Meets Robustness

no code implementations ICLR 2019 Ji Lin, Chuang Gan, Song Han

This paper aims to raise people's awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.

Adversarial Attack Quantization

Learning to Design Circuits

no code implementations5 Dec 2018 Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, Song Han

We propose Learning to Design Circuits (L2DC) to leverage reinforcement learning that learns to efficiently generate new circuits data and to optimize circuits.

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

16 code implementations ICLR 2019 Han Cai, Ligeng Zhu, Song Han

We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set.

Image Classification Neural Architecture Search

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

10 code implementations CVPR 2019 Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

Quantization

TSM: Temporal Shift Module for Efficient Video Understanding

9 code implementations ICCV 2019 Ji Lin, Chuang Gan, Song Han

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

Action Classification Action Recognition +3

Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling

no code implementations NIPS Workshop CDNNRIA 2018 Ting Chen, Ji Lin, Tian Lin, Song Han, Chong Wang, Denny Zhou

Modern deep neural networks have a large amount of weights, which make them difficult to deploy on computation constrained devices such as mobile phones.

Image Classification Language Modelling

Fast inference of deep neural networks in FPGAs for particle physics

2 code implementations16 Apr 2018 Javier Duarte, Song Han, Philip Harris, Sergo Jindariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, Zhenbin Wu

For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.

Efficient Sparse-Winograd Convolutional Neural Networks

1 code implementation ICLR 2018 Xingyu Liu, Jeff Pool, Song Han, William J. Dally

First, we move the ReLU operation into the Winograd domain to increase the sparsity of the transformed activations.

Network Pruning

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

11 code implementations ECCV 2018 Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han

Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.

Model Compression Neural Architecture Search

Deep Gradient Compression Reduce the Communication Bandwidth For distributed Traning

1 code implementation The International Conference on Learning Representations 2017 Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.

Federated Learning Image Classification +1

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

2 code implementations ICLR 2018 Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +1

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

2 code implementations31 May 2017 Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing

A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality.

MRI Reconstruction

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

no code implementations24 May 2017 Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally

Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.

Classification of Neurological Gait Disorders Using Multi-task Feature Learning

no code implementations8 Dec 2016 Ioannis Papavasileiou, Wenlong Zhang, Xin Wang, Jinbo Bi, Li Zhang, Song Han

An advanced machine learning method, multi-task feature learning (MTFL), is used to jointly train classification models of a subject's gait in three classes, post-stroke, PD and healthy gait.

General Classification

Trained Ternary Quantization

4 code implementations4 Dec 2016 Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.

Quantization

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

no code implementations1 Dec 2016 Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.

Quantization Speech Recognition

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

no code implementations5 Feb 2016 Shijian Tang, Song Han

Generating natural language descriptions for images is a challenging task.

EIE: Efficient Inference Engine on Compressed Deep Neural Network

4 code implementations4 Feb 2016 Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.

Learning both Weights and Connections for Efficient Neural Network

no code implementations NeurIPS 2015 Song Han, Jeff Pool, John Tran, William Dally

On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6. 7 million, without incurring accuracy loss.

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

14 code implementations1 Oct 2015 Song Han, Huizi Mao, William J. Dally

To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Network Pruning Quantization

Learning both Weights and Connections for Efficient Neural Networks

7 code implementations NeurIPS 2015 Song Han, Jeff Pool, John Tran, William J. Dally

On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6. 7 million, without incurring accuracy loss.

Robust Face Recognition using Local Illumination Normalization and Discriminant Feature Point Selection

no code implementations11 Dec 2012 Song Han, Jinsong Kim, Cholhun Kim, Jongchol Jo, Sunam Han

Face recognition systems must be robust to the variation of various factors such as facial expression, illumination, head pose and aging.

Face Detection Face Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.