no code implementations • ECCV 2020 • Zhijian Liu, Zhanghao Wu, Chuang Gan, Ligeng Zhu, Song Han
Third, our solution is extit{efficient} on the edge since the majority of the workload is delegated to the cloud, and our mixing and de-mixing processes introduce very few extra computations.
no code implementations • 27 Nov 2023 • Hanrui Wang, Pengyu Liu, Yilian Liu, Jiaqi Gu, Jonathan Baker, Frederic T. Chong, Song Han
By counting the occurrences of edges and edge pairs in decoded matchings, we can statistically estimate the up-to-date probabilities of each edge and the correlations between them.
no code implementations • 27 Nov 2023 • Hanrui Wang, Yilian Liu, Pengyu Liu, Jiaqi Gu, Zirui Li, Zhiding Liang, Jinglei Cheng, Yongshan Ding, Xuehai Qian, Yiyu Shi, David Z. Pan, Frederic T. Chong, Song Han
Arbitrary state preparation algorithms can be broadly categorized into arithmetic decomposition (AD) and variational quantum state preparation (VQSP).
no code implementations • 27 Nov 2023 • Hanrui Wang, Pengyu Liu, Kevin Shao, Dantong Li, Jiaqi Gu, David Z. Pan, Yongshan Ding, Song Han
Quantum Error Correction (QEC) mitigates this by employing redundancy, distributing quantum information across multiple data qubits and utilizing syndrome qubits to monitor their states for errors.
no code implementations • 4 Nov 2023 • Yuan Luo, Song Han, Jingjing Liu
Machine learning is expected to enable the next Industrial Revolution.
no code implementations • 26 Oct 2023 • Ligeng Zhu, Lanxiang Hu, Ji Lin, Wei-Chen Wang, Wei-Ming Chen, Chuang Gan, Song Han
On-device learning and efficient fine-tuning enable continuous and privacy-preserving customization (e. g., locally fine-tuning large language models on personalized data).
1 code implementation • 25 Oct 2023 • Haotian Tang, Shang Yang, Zhijian Liu, Ke Hong, Zhongming Yu, Xiuyu Li, Guohao Dai, Yu Wang, Song Han
On top of this, we design the Sparse Autotuner, which extends the design space of existing sparse convolution libraries and searches for the best dataflow configurations for training and inference workloads.
3 code implementations • 29 Sep 2023 • Guangxuan Xiao, Yuandong Tian, Beidi Chen, Song Han, Mike Lewis
In this paper, we first demonstrate that the emergence of attention sink is due to the strong attention scores towards initial tokens as a ``sink'' even if they are not semantically important.
1 code implementation • 21 Sep 2023 • Yukang Chen, Shengju Qian, Haotian Tang, Xin Lai, Zhijian Liu, Song Han, Jiaya Jia
LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine.
2 code implementations • 1 Jun 2023 • Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Xingyu Dang, Chuang Gan, Song Han
Large language models (LLMs) have shown excellent performance on various tasks, but the astronomical model size raises the hardware barrier for serving (memory size) and slows down token generation (memory bandwidth).
1 code implementation • 17 May 2023 • Guangxuan Xiao, Tianwei Yin, William T. Freeman, Frédo Durand, Song Han
FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation.
1 code implementation • CVPR 2023 • Xuanyao Chen, Zhijian Liu, Haotian Tang, Li Yi, Hang Zhao, Song Han
High-resolution images enable neural networks to learn richer visual representations.
1 code implementation • 9 Feb 2023 • Guangxuan Xiao, Ji Lin, Song Han
In this paper, we propose Offsite-Tuning, a privacy-preserving and efficient transfer learning framework that can adapt billion-parameter foundation models to downstream data without access to the full model.
no code implementations • CVPR 2023 • Zhijian Liu, Xinyu Yang, Haotian Tang, Shang Yang, Song Han
Transformer, as an alternative to CNN, has been proven effective in many modalities (e. g., texts and images).
no code implementations • ICCV 2023 • Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han
Without performance loss on Cityscapes, our EfficientViT provides up to 8. 8x and 3. 8x GPU latency reduction over SegFormer and SegNeXt, respectively.
2 code implementations • 18 Nov 2022 • Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, Song Han
We propose SmoothQuant, a training-free, accuracy-preserving, and general-purpose post-training quantization (PTQ) solution to enable 8-bit weight, 8-bit activation (W8A8) quantization for LLMs.
1 code implementation • 3 Nov 2022 • Muyang Li, Ji Lin, Chenlin Meng, Stefano Ermon, Song Han, Jun-Yan Zhu
With about $1\%$-area edits, SIGE accelerates DDPM by $3. 0\times$ on NVIDIA RTX 3090 and $4. 6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7. 2\times$ on 3090, and GauGAN by $5. 6\times$ on 3090 and $5. 2\times$ on M1 Pro GPU.
1 code implementation • 30 Oct 2022 • Hanrui Wang, Pengyu Liu, Jinglei Cheng, Zhiding Liang, Jiaqi Gu, Zirui Li, Yongshan Ding, Weiwen Jiang, Yiyu Shi, Xuehai Qian, David Z. Pan, Frederic T. Chong, Song Han
Specifically, the TorchQuantum library also supports using data-driven ML models to solve problems in quantum system research, such as predicting the impact of quantum noise on circuit fidelity and improving the quantum circuit compilation efficiency.
no code implementations • 2 Aug 2022 • Zhiding Liang, Jinglei Cheng, Hang Ren, Hanrui Wang, Fei Hua, Zhixin Song, Yongshan Ding, Fred Chong, Song Han, Yiyu Shi, Xuehai Qian
In the case of VQAs, this procedure will introduce redundancy, but the variational properties of VQAs can naturally handle problems of over-rotation and under-rotation by updating the amplitude and frequency parameters.
no code implementations • 13 Jul 2022 • Wei Shi, Hanrui Wang, Jiaqi Gu, Mingjie Liu, David Pan, Song Han, Nan Sun
To address the challenge, we present RobustAnalog, a robust circuit design framework that involves the variation information in the optimization process.
1 code implementation • 30 Jun 2022 • Ji Lin, Ligeng Zhu, Wei-Ming Chen, Wei-Chen Wang, Chuang Gan, Song Han
To reduce the memory footprint, we propose Sparse Update to skip the gradient computation of less important layers and sub-tensors.
no code implementations • 19 Jun 2022 • Pengfei Zhang, Xiaohui Hu, Kaidong Yu, Jian Wang, Song Han, Cao Liu, Chunyang Yuan
Firstly, we build an evaluation metric composed of 5 groups of parallel sub-metrics called Multi-Metric Evaluation (MME) to evaluate the quality of dialogue comprehensively.
4 code implementations • 29 May 2022 • Han Cai, Junyan Li, Muyan Hu, Chuang Gan, Song Han
Unlike prior high-resolution dense prediction models that rely on heavy softmax attention, hardware-inefficient large-kernel convolution, or complicated topology structure to obtain good performances, our multi-scale linear attention achieves the global receptive field and multi-scale learning (two desirable features for high-resolution dense prediction) with only lightweight and hardware-efficient operations.
Ranked #20 on
Semantic Segmentation
on Cityscapes val
no code implementations • 26 May 2022 • Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, Song Han
Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.
Ranked #4 on
3D Object Detection
on nuScenes
1 code implementation • CVPR 2022 • Yihan Wang, Muyang Li, Han Cai, Wei-Ming Chen, Song Han
Inspired by this finding, we design LitePose, an efficient single-branch architecture for pose estimation, and introduce two simple approaches to enhance the capacity of LitePose, including Fusion Deconv Head and Large Kernel Convs.
Ranked #5 on
Multi-Person Pose Estimation
on COCO
(Validation AP metric)
no code implementations • 25 Apr 2022 • Han Cai, Ji Lin, Yujun Lin, Zhijian Liu, Haotian Tang, Hanrui Wang, Ligeng Zhu, Song Han
Deep neural networks (DNNs) have achieved unprecedented success in the field of artificial intelligence (AI), including computer vision, natural language processing and speech recognition.
no code implementations • 25 Apr 2022 • Zhijian Liu, Haotian Tang, Shengyu Zhao, Kevin Shao, Song Han
3D neural networks are widely used in real-world applications (e. g., AR/VR headsets, self-driving cars).
1 code implementation • 21 Apr 2022 • Haotian Tang, Zhijian Liu, Xiuyu Li, Yujun Lin, Song Han
TorchSparse directly optimizes the two bottlenecks of sparse convolution: irregular computation and data movement.
1 code implementation • 26 Feb 2022 • Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han
Nevertheless, we find that due to the significant quantum errors (noises) on real machines, gradients obtained from naive parameter shift have low fidelity and thus degrading the training accuracy.
1 code implementation • 27 Dec 2021 • Nhuong Nguyen, Song Han
In this paper, we propose an Asynchronous Event-triggered Stochastic Gradient Descent (SGD) framework, called AET-SGD, to i) reduce the communication cost among the compute nodes, and ii) mitigate the impact of the delay.
no code implementations • NeurIPS 2021 • Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
We further propose receptive field redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.
no code implementations • NeurIPS 2021 • Ligeng Zhu, Hongzhou Lin, Yao Lu, Yujun Lin, Song Han
Federated Learning is an emerging direction in distributed machine learning that en-ables jointly training a model without sharing the data.
no code implementations • 23 Nov 2021 • Alexander Amini, Tsun-Hsuan Wang, Igor Gilitschenski, Wilko Schwarting, Zhijian Liu, Song Han, Sertac Karaman, Daniela Rus
Simulation has the potential to transform the development of robust algorithms for mobile agents deployed in safety-critical scenarios.
1 code implementation • 28 Oct 2021 • Ji Lin, Wei-Ming Chen, Han Cai, Chuang Gan, Song Han
We further propose network redistribution to shift the receptive field and FLOPs to the later stage and reduce the computation overhead.
2 code implementations • 21 Oct 2021 • Hanrui Wang, Jiaqi Gu, Yongshan Ding, Zirui Li, Frederic T. Chong, David Z. Pan, Song Han
Furthermore, to improve the robustness against noise, we propose noise injection to the training process by inserting quantum error gates to PQC according to realistic noise models of quantum hardware.
no code implementations • ICLR 2022 • Han Cai, Chuang Gan, Ji Lin, Song Han
We introduce Network Augmentation (NetAug), a new training method for improving the performance of tiny neural networks.
no code implementations • 29 Sep 2021 • Hanrui Wang, Zirui Li, Jiaqi Gu, Yongshan Ding, David Z. Pan, Song Han
The results demonstrate that our on-chip training achieves over 90% and 60% accuracy for 2-class and 4-class image classification tasks.
no code implementations • 29 Sep 2021 • Wei Shi, Hanrui Wang, Jiaqi Gu, Mingjie Liu, David Z. Pan, Song Han, Nan Sun
Specifically, circuit optimizations under different variations are considered as a set of tasks.
2 code implementations • 27 Sep 2021 • Ji Lin, Chuang Gan, Kuan Wang, Song Han
Secondly, TSM has high efficiency; it achieves a high frame rate of 74fps and 29fps for online video recognition on Jetson Nano and Galaxy Note8.
no code implementations • ICCV 2021 • Zhijian Liu, Simon Stent, Jie Li, John Gideon, Song Han
Computer vision tasks such as object detection and semantic/instance segmentation rely on the painstaking annotation of large training datasets.
2 code implementations • 22 Jul 2021 • Hanrui Wang, Yongshan Ding, Jiaqi Gu, Zirui Li, Yujun Lin, David Z. Pan, Frederic T. Chong, Song Han
Extensively evaluated with 12 QML and VQE benchmarks on 14 quantum computers, QuantumNAS significantly outperforms baselines.
no code implementations • 27 May 2021 • Yujun Lin, Mengtian Yang, Song Han
Data-driven, automatic design space exploration of neural accelerator architecture is desirable for specialization and productivity.
no code implementations • 20 May 2021 • Zhijian Liu, Alexander Amini, Sibo Zhu, Sertac Karaman, Song Han, Daniela Rus
On the other hand, increasing the robustness of these systems is also critical; however, even estimating the model's uncertainty is very challenging due to the cost of sampling-based methods.
1 code implementation • 10 Mar 2021 • Huizi Mao, Sibo Zhu, Song Han, William J. Dally
Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.
1 code implementation • CVPR 2021 • Ji Lin, Richard Zhang, Frieder Ganz, Song Han, Jun-Yan Zhu
Generative adversarial networks (GANs) have enabled photorealistic image synthesis and editing.
no code implementations • 17 Dec 2020 • Hanrui Wang, Zhekai Zhang, Song Han
Inspired by the high redundancy of human languages, we propose the novel cascade token pruning to prune away unimportant tokens in the sentence.
1 code implementation • 2 Nov 2020 • Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han
To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-operator parallelization.
no code implementations • 11 Aug 2020 • Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han
Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.
5 code implementations • ECCV 2020 • Haotian Tang, Zhijian Liu, Shengyu Zhao, Yujun Lin, Ji Lin, Hanrui Wang, Song Han
Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely.
Ranked #1 on
Robust 3D Semantic Segmentation
on SemanticKITTI-C
1 code implementation • NeurIPS 2020 • Han Cai, Chuang Gan, Ligeng Zhu, Song Han
Furthermore, combined with feature extractor adaptation, TinyTL provides 7. 3-12. 9x memory saving without sacrificing accuracy compared to fine-tuning the full Inception-V3.
1 code implementation • NeurIPS 2020 • Ji Lin, Wei-Ming Chen, Yujun Lin, John Cohn, Chuang Gan, Song Han
Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones.
10 code implementations • NeurIPS 2020 • Shengyu Zhao, Zhijian Liu, Ji Lin, Jun-Yan Zhu, Song Han
Furthermore, with only 20% training data, we can match the top performance on CIFAR-10 and CIFAR-100.
Ranked #1 on
Image Generation
on CIFAR-10 (20% data)
1 code implementation • CVPR 2020 • Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Song Han
However, training this quantization-aware accuracy predictor requires collecting a large number of quantized <model, accuracy> pairs, which involves quantization-aware finetuning and thus is highly time-consuming.
4 code implementations • ACL 2020 • Hanrui Wang, Zhanghao Wu, Zhijian Liu, Han Cai, Ligeng Zhu, Chuang Gan, Song Han
To enable low-latency inference on resource-constrained hardware platforms, we propose to design Hardware-Aware Transformers (HAT) with neural architecture search.
Ranked #21 on
Machine Translation
on WMT2014 English-French
1 code implementation • 16 May 2020 • Zhongxia Yan, Hanrui Wang, Demi Guo, Song Han
In this paper, we provide the winning solution to the NeurIPS 2019 MicroNet Challenge in the language modeling track.
1 code implementation • ICLR 2020 • Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
Most of the traditional approaches either manually design or use neural architecture search (NAS) to find a specialized neural network and train it from scratch for each case, which is computationally expensive and unscalable.
no code implementations • 30 Apr 2020 • Hanrui Wang, Kuan Wang, Jiacheng Yang, Linxiao Shen, Nan Sun, Hae-Seung Lee, Song Han
Our transferable optimization method makes transistor sizing and design porting more effective and efficient.
2 code implementations • ICLR 2020 • Zhanghao Wu, Zhijian Liu, Ji Lin, Yujun Lin, Song Han
For language modeling, Lite Transformer achieves 1. 8 lower perplexity than the transformer at around 500M MACs.
Ranked #36 on
Machine Translation
on WMT2014 English-French
1 code implementation • CVPR 2020 • Muyang Li, Ji Lin, Yaoyao Ding, Zhijian Liu, Jun-Yan Zhu, Song Han
Directly applying existing compression methods yields poor performance due to the difficulty of GAN training and the differences in generator architectures.
no code implementations • 20 Feb 2020 • Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally
We then propose a condensed matrix representation that reduces the number of partial matrices by three orders of magnitude and thus reduces DRAM access by 5. 4x.
Hardware Architecture Distributed, Parallel, and Cluster Computing
1 code implementation • NeurIPS 2019 • Hongzi Mao, Parimarjan Negi, Akshay Narayan, Hanrui Wang, Jiacheng Yang, Haonan Wang, Ryan Marcus, Ravichandra Addanki, Mehrdad Khani Shirkoohi, Songtao He, Vikram Nathan, Frank Cangialosi, Shaileshh Venkatakrishnan, Wei-Hung Weng, Song Han, Tim Kraska, Dr.Mohammad Alizadeh
We present Park, a platform for researchers to experiment with Reinforcement Learning (RL) for computer systems.
no code implementations • 1 Oct 2019 • Ji Lin, Chuang Gan, Song Han
With such hardware-aware model design, we are able to scale up the training on Summit supercomputer and reduce the training time on Kinetics dataset from 49 hours 55 minutes to 14 minutes 13 seconds, achieving a top-1 accuracy of 74. 0%, which is 1. 6x and 2. 9x faster than previous 3D video models with higher accuracy.
no code implementations • 25 Sep 2019 • Ligeng Zhu, Yao Lu, Yujun Lin, Song Han
Traditional synchronous distributed training is performed inside a cluster, since it requires high bandwidth and low latency network (e. g. 25Gb Ethernet or Infini-band).
10 code implementations • 26 Aug 2019 • Han Cai, Chuang Gan, Tianzhe Wang, Zhekai Zhang, Song Han
On diverse edge devices, OFA consistently outperforms state-of-the-art (SOTA) NAS methods (up to 4. 0% ImageNet top1 accuracy improvement over MobileNetV3, or same accuracy but 1. 5x faster than MobileNetV3, 2. 6x faster than EfficientNet w. r. t measured latency) while reducing many orders of magnitude GPU hours and $CO_2$ emission.
Ranked #76 on
Neural Architecture Search
on ImageNet
3 code implementations • NeurIPS 2019 • Zhijian Liu, Haotian Tang, Yujun Lin, Song Han
The computation cost and memory footprints of the voxel-based models grow cubically with the input resolution, making it memory-prohibitive to scale up the resolution.
Ranked #1 on
3D Object Detection
on KITTI Pedestrian Easy val
6 code implementations • NeurIPS 2019 • Ligeng Zhu, Zhijian Liu, Song Han
Exchanging gradients is a widely used method in modern multi-node machine learning system (e. g., distributed training, collaborative learning).
no code implementations • 24 Apr 2019 • Song Han, Han Cai, Ligeng Zhu, Ji Lin, Kuan Wang, Zhijian Liu, Yujun Lin
Moreover, we shorten the design cycle by 200x than previous work, so that we can afford to design specialized neural network models for different hardware platforms.
no code implementations • ICLR 2019 • Ji Lin, Chuang Gan, Song Han
This paper aims to raise people's awareness about the security of the quantized models, and we designed a novel quantization methodology to jointly optimize the efficiency and robustness of deep learning models.
no code implementations • 29 Mar 2019 • Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Jennifer Chayes, Eric Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim Hazelwood, Furong Huang, Martin Jaggi, Kevin Jamieson, Michael. I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konečný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Aparna Lakshmiratan, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Murray, Kunle Olukotun, Dimitris Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar
Machine learning (ML) techniques are enjoying rapidly increasing adoption.
no code implementations • 5 Dec 2018 • Hanrui Wang, Jiacheng Yang, Hae-Seung Lee, Song Han
We propose Learning to Design Circuits (L2DC) to leverage reinforcement learning that learns to efficiently generate new circuits data and to optimize circuits.
21 code implementations • ICLR 2019 • Han Cai, Ligeng Zhu, Song Han
We address the high memory consumption issue of differentiable NAS and reduce the computational cost (GPU hours and GPU memory) to the same level of regular training while still allowing a large candidate set.
Ranked #6 on
Neural Architecture Search
on CIFAR-10 Image Classification
(using extra training data)
11 code implementations • CVPR 2019 • Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, Song Han
Compared with conventional methods, our framework is fully automated and can specialize the quantization policy for different neural network architectures and hardware architectures.
12 code implementations • ICCV 2019 • Ji Lin, Chuang Gan, Song Han
The explosive growth in video streaming gives rise to challenges on performing video understanding at high accuracy and low computation cost.
Ranked #24 on
Video Object Detection
on ImageNet VID
no code implementations • NIPS Workshop CDNNRIA 2018 • Ting Chen, Ji Lin, Tian Lin, Song Han, Chong Wang, Denny Zhou
Modern deep neural networks have a large amount of weights, which make them difficult to deploy on computation constrained devices such as mobile phones.
3 code implementations • ICML 2018 • Han Cai, Jiacheng Yang, Wei-Nan Zhang, Song Han, Yong Yu
We introduce a new function-preserving transformation for efficient neural architecture search.
2 code implementations • 16 Apr 2018 • Javier Duarte, Song Han, Philip Harris, Sergo Jindariani, Edward Kreinar, Benjamin Kreis, Jennifer Ngadiuba, Maurizio Pierini, Ryan Rivera, Nhan Tran, Zhenbin Wu
For our example jet substructure model, we fit well within the available resources of modern FPGAs with a latency on the scale of 100 ns.
1 code implementation • ICLR 2018 • Xingyu Liu, Jeff Pool, Song Han, William J. Dally
First, we move the ReLU operation into the Winograd domain to increase the sparsity of the transformed activations.
12 code implementations • ECCV 2018 • Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, Song Han
Model compression is a critical technique to efficiently deploy neural network models on mobile devices which have limited computation resources and tight power budgets.
3 code implementations • ICLR 2018 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally
The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.
1 code implementation • The International Conference on Learning Representations 2017 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally
Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.
2 code implementations • 31 May 2017 • Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus Alley, Neil Thakur, Song Han, William Dally, John M. Pauly, Lei Xing
A multilayer convolutional neural network is then jointly trained based on diagnostic quality images to discriminate the projection quality.
no code implementations • 24 May 2017 • Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally
Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.
no code implementations • 8 Dec 2016 • Ioannis Papavasileiou, Wenlong Zhang, Xin Wang, Jinbo Bi, Li Zhang, Song Han
An advanced machine learning method, multi-task feature learning (MTFL), is used to jointly train classification models of a subject's gait in three classes, post-stroke, PD and healthy gait.
4 code implementations • 4 Dec 2016 • Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally
To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.
no code implementations • 1 Dec 2016 • Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally
Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.
2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally
We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.
56 code implementations • 24 Feb 2016 • Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, Kurt Keutzer
(2) Smaller DNNs require less bandwidth to export a new model from the cloud to an autonomous car.
Ranked #1 on
Image Classification
on ImageNet-P
no code implementations • 5 Feb 2016 • Shijian Tang, Song Han
Generating natural language descriptions for images is a challenging task.
4 code implementations • 4 Feb 2016 • Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally
EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.
no code implementations • NeurIPS 2015 • Song Han, Jeff Pool, John Tran, William Dally
On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9×, from 61 million to 6. 7 million, without incurring accuracy loss.
15 code implementations • 1 Oct 2015 • Song Han, Huizi Mao, William J. Dally
To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
8 code implementations • NeurIPS 2015 • Song Han, Jeff Pool, John Tran, William J. Dally
On the ImageNet dataset, our method reduced the number of parameters of AlexNet by a factor of 9x, from 61 million to 6. 7 million, without incurring accuracy loss.
no code implementations • 11 Dec 2012 • Song Han, Jinsong Kim, Cholhun Kim, Jongchol Jo, Sunam Han
Face recognition systems must be robust to the variation of various factors such as facial expression, illumination, head pose and aging.