Search Results for author: Huizi Mao

Found 13 papers, 10 papers with code

VILA: On Pre-training for Visual Language Models

2 code implementations • 12 Dec 2023 • Ji Lin, Hongxu Yin, Wei Ping, Yao Lu, Pavlo Molchanov, Andrew Tao, Huizi Mao, Jan Kautz, Mohammad Shoeybi, Song Han

Visual language models (VLMs) rapidly progressed with the recent success of large language models.

Ranked #21 on Visual Question Answering on MM-Vet

In-Context Learning Language Modelling +2

1,784

Paper
Code

BEVFusion: Multi-Task Multi-Sensor Fusion with Unified Bird's-Eye View Representation

1 code implementation • 26 May 2022 • Zhijian Liu, Haotian Tang, Alexander Amini, Xinyu Yang, Huizi Mao, Daniela Rus, Song Han

Multi-sensor fusion is essential for an accurate and reliable autonomous driving system.

Ranked #4 on 3D Object Detection on nuScenes

3D Multi-Object Tracking 3D Object Detection +3

2,003

Paper
Code

PatchNet -- Short-range Template Matching for Efficient Video Processing

1 code implementation • 10 Mar 2021 • Huizi Mao, Sibo Zhu, Song Han, William J. Dally

Object recognition is a fundamental problem in many video processing tasks, accurately locating seen objects at low computation cost paves the way for on-device video recognition.

Object object-detection +5

Paper
Code

A Delay Metric for Video Object Detection: What Average Precision Fails to Tell

1 code implementation • ICCV 2019 • Huizi Mao, Xiaodong Yang, William J. Dally

Average precision (AP) is a widely used metric to evaluate detection accuracy of image and video object detectors.

Object object-detection +1

Paper
Code

CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video

no code implementations • 30 Sep 2018 • Huizi Mao, Taeyoung Kong, William J. Dally

Experiments on the KITTI dataset show that CaTDet reduces operation count by 5. 1-8. 7x with the same mean Average Precision(mAP) as the single-model Faster R-CNN detector and incurs additional delay of 0. 3 frame.

object-detection Object Detection

Paper
Add Code

Deep Gradient Compression Reduce the Communication Bandwidth For distributed Traning

1 code implementation • The International Conference on Learning Representations 2017 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, W. Dally

Large-scale distributed training requires significant communication bandwidth for gradient exchange that limits the scalability of multi-node training, and requires expensive high-bandwidth network infrastructure.

Federated Learning Image Classification +3

203

Paper
Code

Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training

3 code implementations • ICLR 2018 • Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally

The situation gets even worse with distributed training on mobile devices (federated learning), which suffers from higher latency, lower throughput, and intermittent poor connections.

Federated Learning Image Classification +3

244

Paper
Code

Exploring the Regularity of Sparse Structure in Convolutional Neural Networks

no code implementations • 24 May 2017 • Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally

Since memory reference is more than two orders of magnitude more expensive than arithmetic operations, the regularity of sparse structure leads to more efficient hardware design.

Paper
Add Code

Trained Ternary Quantization

4 code implementations • 4 Dec 2016 • Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally

To solve this problem, we propose Trained Ternary Quantization (TTQ), a method that can reduce the precision of weights in neural networks to ternary values.

Quantization

6,298

Paper
Code

ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA

no code implementations • 1 Dec 2016 • Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally

Evaluated on the LSTM for speech recognition benchmark, ESE is 43x and 3x faster than Core i7 5930k CPU and Pascal Titan X GPU implementations.

Quantization speech-recognition +1

Paper
Add Code

DSD: Dense-Sparse-Dense Training for Deep Neural Networks

2 code implementations • 15 Jul 2016 • Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally

We propose DSD, a dense-sparse-dense training flow, for regularizing deep neural networks and achieving better optimization performance.

8k Caption Generation +3

Paper
Code

EIE: Efficient Inference Engine on Compressed Deep Neural Network

4 code implementations • 4 Feb 2016 • Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally

EIE has a processing power of 102GOPS/s working directly on a compressed network, corresponding to 3TOPS/s on an uncompressed network, and processes FC layers of AlexNet at 1. 88x10^4 frames/sec with a power dissipation of only 600mW.

644

Paper
Code

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

15 code implementations • 1 Oct 2015 • Song Han, Huizi Mao, William J. Dally

To address this limitation, we introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.

Network Pruning Quantization

4,304

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.