Search Results for author: Song Han

Found 101 papers, 60 papers with code

DataMix: Efficient Privacy-Preserving Edge-Cloud Inference

no code implementations ECCV 2020 Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, Song Han

Third, our solution is extit{efficient} on the edge since the majority of the workload is delegated to the cloud, and our mixing and de-mixing processes introduce very few extra computations.

Privacy Preserving speech-recognition +1

Condition-Aware Neural Network for Controlled Image Generation

no code implementations1 Apr 2024 Han Cai, Muyang Li, Zhuoyang Zhang, Qinsheng Zhang, Ming-Yu Liu, Song Han

In parallel to prior conditional control methods, CAN controls the image generation process by dynamically manipulating the weight of the neural network.

Conditional Image Generation Text-to-Image Generation

Tiny Machine Learning: Progress and Futures

1 code implementation28 Mar 2024 Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Song Han

By squeezing deep learning models into billions of IoT devices and microcontrollers (MCUs), we expand the scope of AI applications and enable ubiquitous intelligence.

DistriFusion: Distributed Parallel Inference for High-Resolution Diffusion Models

1 code implementation29 Feb 2024 Muyang Li, Tianle Cai, Jiaxin Cao, Qinsheng Zhang, Han Cai, Junjie Bai, Yangqing Jia, Ming-Yu Liu, Kai Li, Song Han

To overcome this dilemma, we observe the high similarity between the input from adjacent diffusion steps and propose displaced patch parallelism, which takes advantage of the sequential nature of the diffusion process by reusing the pre-computed feature maps from the previous timestep to provide context for the current step.

BitDelta: Your Fine-Tune May Only Be Worth One Bit

1 code implementation15 Feb 2024 James Liu, Guangxuan Xiao, Kai Li, Jason D. Lee, Song Han, Tri Dao, Tianle Cai

Large Language Models (LLMs) are typically trained in two phases: pre-training on large internet-scale datasets, and fine-tuning for downstream tasks.

InfLLM: Unveiling the Intrinsic Capacity of LLMs for Understanding Extremely Long Sequences with Training-Free Memory

no code implementations7 Feb 2024 Chaojun Xiao, Pengle Zhang, Xu Han, Guangxuan Xiao, Yankai Lin, Zhengyan Zhang, Zhiyuan Liu, Song Han, Maosong Sun

To alleviate these issues, existing efforts employ sliding attention windows and discard distant tokens to achieve the processing of extremely long sequences.

EfficientViT-SAM: Accelerated Segment Anything Model Without Performance Loss

1 code implementation7 Feb 2024 Zhuoyang Zhang, Han Cai, Song Han

For the training, we begin with the knowledge distillation from the SAM-ViT-H image encoder to EfficientViT.

Knowledge Distillation Zero-Shot Instance Segmentation

QuantumSEA: In-Time Sparse Exploration for Noise Adaptive Quantum Circuits

1 code implementation10 Jan 2024 Tianlong Chen, Zhenyu Zhang, Hanrui Wang, Jiaqi Gu, Zirui Li, David Z. Pan, Frederic T. Chong, Song Han, Zhangyang Wang

To address these two pain points, we propose QuantumSEA, an in-time sparse exploration for noise-adaptive quantum circuits, aiming to achieve two key objectives: (1) implicit circuits capacity during training - by dynamically exploring the circuit's sparse connectivity and sticking a fixed small number of quantum gates throughout the training which satisfies the coherence time and enjoy light noises, enabling feasible executions on real quantum devices; (2) noise robustness - by jointly optimizing the topology and parameters of quantum circuits under real device noise models.

Quantum Machine Learning

DGR: Tackling Drifted and Correlated Noise in Quantum Error Correction via Decoding Graph Re-weighting

no code implementations27 Nov 2023 Hanrui Wang, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jonathan Baker, Frederic T. Chong, Song Han

By counting the occurrences of edges and edge pairs in decoded matchings, we can statistically estimate the up-to-date probabilities of each edge and the correlations between them.

RobustState: Boosting Fidelity of Quantum State Preparation via Noise-Aware Variational Training

no code implementations27 Nov 2023 Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, Jinglei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han

Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP).

Transformer-QEC: Quantum Error Correction Code Decoding with Transferable Transformers

no code implementations27 Nov 2023 Hanrui Wang, Pengyu Liu, Kevin Shao, Dantong Li, Jiaqi Gu, David Z. Pan, Yongshan Ding, Song Han

Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their states for errors.

Transfer Learning

Machine learning's own Industrial Revolution

no code implementations4 Nov 2023 Yuan Luo, Song Han, Jingjing Liu

Machine learning is expected to enable the next Industrial Revolution.

Translation

PockEngine: Sparse and Efficient Fine-tuning in a Pocket

no code implementations26 Oct 2023 Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han

On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e. g., locally fine-tuning large language models on personalized data).

Privacy Preserving

TorchSparse++: Efficient Training and Inference Framework for Sparse Convolution on GPUs

1 code implementation25 Oct 2023 Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han

On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.

Autonomous Driving Recommendation Systems

Efficient Streaming Language Models with Attention Sinks

5 code implementations29 Sep 2023 Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis

In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a "sink" even if they are not semantically important.

Language Modelling

LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models

2 code implementations21 Sep 2023 Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia

For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048.

4k Instruction Following +2

AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration

5 code implementations1 Jun 2023 Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, Chuang Gan, Song Han

Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).

Common Sense Reasoning Language Modelling +1

FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention

1 code implementation17 May 2023 Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han

FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation.

Denoising Diffusion Personalization Tuning Free +1

Offsite-Tuning: Transfer Learning without Full Model

1 code implementation9 Feb 2023 Guangxuan Xiao, Ji Lin, Song Han

In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model.

Privacy Preserving Transfer Learning

EfficientViT: Lightweight Multi-Scale Attention for High-Resolution Dense Prediction

no code implementations ICCV 2023 Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

Without performance loss on Cityscapes, our EfficientViT provides up to 8. 8x and 3. 8x GPU latency reduction over SegFormer and SegNeXt, respectively.

Autonomous Driving Super-Resolution

SmoothQuant: Accurate and Efficient Post-Training Quantization for Large Language Models

3 code implementations18 Nov 2022 Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han

We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.

Quantization

Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models

1 code implementation3 Nov 2022 Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu

With about $1\%$-area edits, SIGE accelerates DDPM by $3. 0\times$ on NVIDIA RTX 3090 and $4. 6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7. 2\times$ on 3090, and GauGAN by $5. 6\times$ on 3090 and $5. 2\times$ on M1 Pro GPU.

QuEst: Graph Transformer for Quantum Circuit Reliability Estimation

1 code implementation30 Oct 2022 Hanrui Wang, Pengyu Liu, Jinglei Cheng, Zhiding Liang, Jiaqi Gu, Zirui Li, Yongshan Ding, Weiwen Jiang, Yiyu Shi, Xuehai Qian, David Z. Pan, Frederic T. Chong, Song Han

Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency.

RobustAnalog: Fast Variation-Aware Analog Circuit Design Via Multi-task RL

no code implementations13 Jul 2022 Wei Shi, Hanrui Wang, Jiaqi Gu, Mingjie Liu, David Pan, Song Han, Nan Sun

To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process.

Bayesian Optimization

On-Device Training Under 256KB Memory

1 code implementation30 Jun 2022 Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han

To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors.

Quantization Transfer Learning

MME-CRS: Multi-Metric Evaluation Based on Correlation Re-Scaling for Evaluating Open-Domain Dialogue

no code implementations19 Jun 2022 Pengfei Zhang, Xiaohui Hu, Kaidong Yu, Jian Wang, Song Han, Cao Liu, Chunyang Yuan

Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.

Dialogue Evaluation

EfficientViT: Multi-Scale Linear Attention for High-Resolution Dense Prediction

5 code implementations29 May 2022 Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han

Without performance loss on Cityscapes, our EfficientViT provides up to 13. 9$\times$ and 6. 2$\times$ GPU latency reduction over SegFormer and SegNeXt, respectively.

Autonomous Driving Image Classification +7

Lite Pose: Efficient Architecture Design for 2D Human Pose Estimation

1 code implementation CVPR 2022 Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, Song Han

Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs.

Ranked #5 on Multi-Person Pose Estimation on MS COCO (Validation AP metric)

2D Human Pose Estimation Multi-Person Pose Estimation

Enable Deep Learning on Mobile Devices: Methods, Systems, and Applications

no code implementations25 Apr 2022 Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han

Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.

Model Compression Neural Architecture Search +3

TorchSparse: Efficient Point Cloud Inference Engine

1 code implementation21 Apr 2022 Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han

TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.

Autonomous Driving

QOC: Quantum On-Chip Training with Parameter Shift and Gradient Pruning

1 code implementation26 Feb 2022 Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han

Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low fidelity and thus degrading the training accuracy.

Image Classification

AET-SGD: Asynchronous Event-triggered Stochastic Gradient Descent

1 code implementation27 Dec 2021 Nhuong Nguyen, Song Han

In this paper, we propose an Asynchronous Event-triggered Stochastic Gradient Descent (SGD) framework, called AET-SGD, to i) reduce the communication cost among the compute nodes, and ii) mitigate the impact of the delay.

Delayed Gradient Averaging: Tolerate the Communication Latency for Federated Learning

no code implementations NeurIPS 2021 Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han

Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data.

Federated Learning

Memory-efficient Patch-based Inference for Tiny Deep Learning

no code implementations NeurIPS 2021 Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +3

MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning

1 code implementation28 Oct 2021 Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han

We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.

Image Classification Neural Architecture Search +3

QuantumNAT: Quantum Noise-Aware Training with Noise Injection, Quantization and Normalization

2 code implementations21 Oct 2021 Hanrui Wang, Jiaqi Gu, Yongshan Ding, Zirui Li, Frederic T. Chong, David Z. Pan, Song Han

Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware.

Denoising Quantization

Network Augmentation for Tiny Deep Learning

no code implementations ICLR 2022 Han Cai, Chuang Gan, Ji Lin, Song Han

We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.

Data Augmentation Image Classification +2

Towards Efficient On-Chip Training of Quantum Neural Networks

no code implementations29 Sep 2021 Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han

The results demonstrate that our on-chip training achieves over 90% and 60% accuracy for 2-class and 4-class image classification tasks.

Image Classification

TSM: Temporal Shift Module for Efficient and Scalable Video Understanding on Edge Device

4 code implementations27 Sep 2021 Ji Lin, Chuang Gan, Kuan Wang, Song Han

Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.

Video Recognition Video Understanding

LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision

no code implementations ICCV 2021 Zhijian Liu, Simon Stent, Jie Li, John Gideon, Song Han

Computer vision tasks such as object detection and semantic/instance segmentation rely on the painstaking annotation of large training datasets.

Image Classification Instance Segmentation +3

QuantumNAS: Noise-Adaptive Search for Robust Quantum Circuits

2 code implementations22 Jul 2021 Hanrui Wang, Yongshan Ding, Jiaqi Gu, Zirui Li, Yujun Lin, David Z. Pan, Frederic T. Chong, Song Han

Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines.

NAAS: Neural Accelerator Architecture Search

no code implementations27 May 2021 Yujun Lin, Mengtian Yang, Song Han

Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity.

Efficient and Robust LiDAR-Based End-to-End Navigation

no code implementations20 May 2021 Zhijian Liu, Alexander Amini, Sibo Zhu, Sertac Karaman, Song Han, Daniela Rus

On the other hand, increasing the robustness of these systems is also critical; however, even estimating the model's uncertainty is very challenging due to the cost of sampling-based methods.

PatchNet -- Short-range Template Matching for Efficient Video Processing

1 code implementation10 Mar 2021 Huizi Mao, Sibo Zhu, Song Han, William J. Dally

Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.

Object object-detection +5

Anycost GANs for Interactive Image Synthesis and Editing

1 code implementation CVPR 2021 Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu

Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing.

Image Generation

SpAtten: Efficient Sparse Attention Architecture with Cascade Token and Head Pruning

no code implementations17 Dec 2020 Hanrui Wang, Zhekai Zhang, Song Han

Inspired by the high redundancy of human languages, we propose the novel cascade token pruning to prune away unimportant tokens in the sentence.

Quantization Sentence

IOS: Inter-Operator Scheduler for CNN Acceleration

1 code implementation2 Nov 2020 Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han

To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization.

Hardware-Centric AutoML for Mixed-Precision Quantization

no code implementations11 Aug 2020 Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

AutoML Quantization

TinyTL: Reduce Activations, Not Trainable Parameters for Efficient On-Device Learning

1 code implementation NeurIPS 2020 Han Cai, Chuang Gan, Ligeng Zhu, Song Han

Furthermore, combined with feature extractor adaptation, TinyTL provides 7. 3-12. 9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.

Transfer Learning

MCUNet: Tiny Deep Learning on IoT Devices

1 code implementation NeurIPS 2020 Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.

BIG-bench Machine Learning Neural Architecture Search +1

APQ: Joint Search for Network Architecture, Pruning and Quantization Policy

1 code implementation CVPR 2020 Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Song Han

However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming.

Quantization

HAT: Hardware-Aware Transformers for Efficient Natural Language Processing

4 code implementations ACL 2020 Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han

To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search.

Machine Translation Neural Architecture Search +1

MicroNet for Efficient Language Modeling

1 code implementation16 May 2020 Zhongxia Yan, Hanrui Wang, Demi Guo, Song Han

In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track.

Knowledge Distillation Language Modelling +3

Once for All: Train One Network and Specialize it for Efficient Deployment

1 code implementation ICLR 2020 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.

Neural Architecture Search

GAN Compression: Efficient Architectures for Interactive Conditional GANs

1 code implementation CVPR 2020 Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han

Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures.

Image Generation Neural Architecture Search

SpArch: Efficient Architecture for Sparse Matrix Multiplication

no code implementations20 Feb 2020 Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally

We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5. 4x.

Hardware Architecture Distributed, Parallel, and Cluster Computing

Training Kinetics in 15 Minutes: Large-scale Distributed Training on Videos

no code implementations1 Oct 2019 Ji Lin, Chuang Gan, Song Han

With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74. 0%, which is 1. 6x and 2. 9x faster than previous 3D video models with higher accuracy.

Video Recognition

Distributed Training Across the World

no code implementations25 Sep 2019 Ligeng Zhu, Yao Lu, Yujun Lin, Song Han

Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e. g. 25Gb Ethernet or Infini-band).

Once-for-All: Train One Network and Specialize it for Efficient Deployment

10 code implementations26 Aug 2019 Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han

On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.

Neural Architecture Search

Point-Voxel CNN for Efficient 3D Deep Learning

4 code implementations NeurIPS 2019 Zhijian Liu, Haotian Tang, Yujun Lin, Song Han

The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.

3D Object Detection 3D Semantic Segmentation +2

Deep Leakage from Gradients

7 code implementations NeurIPS 2019 Ligeng Zhu, Zhijian Liu, Song Han

Exchanging gradients is a widely used method in modern multi-node machine learning system (e. g., distributed training, collaborative learning).

Design Automation for Efficient Deep Learning Computing

no code implementations24 Apr 2019 Song Han, Han Cai, Ligeng Zhu, Ji Lin, Kuan Wang, Zhijian Liu, Yujun Lin

Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.

Quantization

Defensive Quantization: When Efficiency Meets Robustness

no code implementations ICLR 2019 Ji Lin, Chuang Gan, Song Han

This paper aims to raise people's awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.

Adversarial Attack Quantization

Learning to Design Circuits

no code implementations5 Dec 2018 Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, Song Han

We propose Learning to Design Circuits (L2DC) to leverage reinforcement learning that learns to efficiently generate new circuits data and to optimize circuits.

Bayesian Optimization

ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

23 code implementations ICLR 2019 Han Cai, Ligeng Zhu, Song Han

We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set.

Image Classification Neural Architecture Search

HAQ: Hardware-Aware Automated Quantization with Mixed Precision

11 code implementations CVPR 2019 Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han

Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.

Quantization

TSM: Temporal Shift Module for Efficient Video Understanding

13 code implementations ICCV 2019 Ji Lin, Chuang Gan, Song Han

The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.

3D Action Recognition Action Classification +6

Adaptive Mixture of Low-Rank Factorizations for Compact Neural Modeling

no code implementations NIPS Workshop CDNNRIA 2018 Ting Chen, Ji Lin, Tian Lin, Song Han, Chong Wang, Denny Zhou

Modern deep neural networks have a large amount of weights, which make them difficult to deploy on computation constrained devices such as mobile phones.

Image Classification Language Modelling

Fast inference of deep neural networks in FPGAs for particle physics

2 code implementations16 Apr 2018 Javier Duarte, Song Han, Philip Harris, Sergo Jindariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, Zhenbin Wu

For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.

BIG-bench Machine Learning

Efficient Sparse-Winograd Convolutional Neural Networks

1 code implementation ICLR 2018 Xingyu Liu, Jeff Pool, Song Han, William J. Dally

First, we move the ReLU operation into the Winograd domain to increase the sparsity of the transformed activations.

Network Pruning

AMC: AutoML for Model Compression and Acceleration on Mobile Devices

12 code implementations ECCV 2018 Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han

Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.

Model Compression Neural Architecture Search

Deep Gradient Compression Reduce the Communication Bandwidth For distributed Traning

1 code implementation The International Conference on Learning Representations 2017 Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.

Federated Learning Image Classification +3

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

3 code implementations ICLR 2018 Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +3

Deep Generative Adversarial Networks for Compressed Sensing Automates MRI

2 code implementations31 May 2017 Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing

A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality.

MRI Reconstruction

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

no code implementations24 May 2017 Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally

Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.

Classification of Neurological Gait Disorders Using Multi-task Feature Learning

no code implementations8 Dec 2016 Ioannis Papavasileiou, Wenlong Zhang, Xin Wang, Jinbo Bi, Li Zhang, Song Han

An advanced machine learning method, multi-task feature learning (MTFL), is used to jointly train classification models of a subject's gait in three classes, post-stroke, PD and healthy gait.

Classification General Classification

Trained Ternary Quantization

4 code implementations4 Dec 2016 Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.

Quantization

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

no code implementations1 Dec 2016 Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.

Quantization speech-recognition +1

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations15 Jul 2016 Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

Generate Image Descriptions based on Deep RNN and Memory Cells for Images Features

no code implementations5 Feb 2016 Shijian Tang, Song Han

Generating natural language descriptions for images is a challenging task.

EIE: Efficient Inference Engine on Compressed Deep Neural Network

4 code implementations4 Feb 2016 Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.

Learning both Weights and Connections for Efficient Neural Network

no code implementations NeurIPS 2015 Song Han, Jeff Pool, John Tran, William Dally

On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6. 7 million, without incurring accuracy loss.

Efficient Neural Network

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

15 code implementations1 Oct 2015 Song Han, Huizi Mao, William J. Dally

To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Network Pruning Quantization

Learning both Weights and Connections for Efficient Neural Networks

7 code implementations NeurIPS 2015 Song Han, Jeff Pool, John Tran, William J. Dally

On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6. 7 million, without incurring accuracy loss.

Robust Face Recognition using Local Illumination Normalization and Discriminant Feature Point Selection

no code implementations11 Dec 2012 Song Han, Jinsong Kim, Cholhun Kim, Jongchol Jo, Sunam Han

Face recognition systems must be robust to the variation of various factors such as facial expression, illumination, head pose and aging.

Face Detection Face Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.