Search Results for author: Deming Chen

Found 30 papers, 15 papers with code

HiKonv: High Throughput Quantized Convolution With Novel Bit-wise Management and Computation

no code implementations28 Dec 2021 Xinheng Liu, Yao Chen, Prakhar Ganesh, Junhao Pan, JinJun Xiong, Deming Chen

Quantization for Convolutional Neural Network (CNN) has shown significant progress with the intention of reducing the cost of computation and storage with low-bitwidth data inputs.

Quantization

EH-DNAS: End-to-End Hardware-aware Differentiable Neural Architecture Search

1 code implementation24 Nov 2021 Qian Jiang, Xiaofan Zhang, Deming Chen, Minh N. Do, Raymond A. Yeh

In this work, we propose End-to-end Hardware-aware DNAS (EH-DNAS), a seamless integration of end-to-end hardware benchmarking, and fully automated DNAS to deliver hardware-efficient deep neural networks on various platforms, including Edge GPUs, Edge TPUs, Mobile CPUs, and customized accelerators.

Neural Architecture Search

YOLO-ReT: Towards High Accuracy Real-time Object Detection on Edge GPUs

2 code implementations26 Oct 2021 Prakhar Ganesh, Yao Chen, Yin Yang, Deming Chen, Marianne Winslett

Performance of object detection models has been growing rapidly on two major fronts, model accuracy and efficiency.

Real-Time Object Detection Transfer Learning

Generic Neural Architecture Search via Regression

1 code implementation NeurIPS 2021 Yuhong Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples.

Image Classification Neural Architecture Search

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

1 code implementation9 Jul 2021 Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen

We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency.

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

no code implementations8 Apr 2021 Cong Hao, Deming Chen

We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency.

Multi-Task Learning

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

no code implementations25 Mar 2021 Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.

Edge-computing

F-CAD: A Framework to Explore Hardware Accelerators for Codec Avatar Decoding

no code implementations8 Mar 2021 Xiaofan Zhang, Dawei Wang, Pierce Chuang, Shugao Ma, Deming Chen, Yuecheng Li

Creating virtual avatars with realistic rendering is one of the most essential and challenging tasks to provide highly immersive virtual reality (VR) experiences.

Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture

1 code implementation4 Mar 2021 Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

In this work, we propose a novel GPU-oriented data communication approach for GCN training, where GPU threads directly access sparse features in host memory through zero-copy accesses without much CPU help.

Graph Convolutional Network Recommendation Systems

PyTorch-Direct: Enabling GPU Centric Data Access for Very Large Graph Neural Network Training with Irregular Accesses

1 code implementation20 Jan 2021 Seung Won Min, Kun Wu, Sitao Huang, Mert Hidayetoğlu, JinJun Xiong, Eiman Ebrahimi, Deming Chen, Wen-mei Hwu

While this process accounts for a significant portion of the training time, we find existing GNN implementations using popular deep neural network (DNN) libraries such as PyTorch are limited to a CPU-centric approach for the entire data preparation step.

TwinDNN: A Tale of Two Deep Neural Networks

no code implementations1 Jan 2021 Hyunmin Jeong, Deming Chen

This is the first work that considers using a highly compressed DNN along with the original DNN in parallel to improve latency significantly while effectively maintaining the original model accuracy.

Image Classification Model Compression +1

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation1 Jan 2021 Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations14 Oct 2020 Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

Comprehensive assessment of error correction methods for high-throughput sequencing data

no code implementations10 Jul 2020 Yun Heo, Gowthami Manikandan, Anand Ramachandran, Deming Chen

We also present a compilation of sequencing datasets for Illumina, PacBio and ONT platforms that present challenging scenarios for error-correction tools.

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

1 code implementation18 May 2020 Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs.

Model Compression Object Detection +1

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations6 May 2020 Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

HybridDNN: A Framework for High-Performance Hybrid DNN Accelerator Design and Implementation

no code implementations8 Apr 2020 Hanchen Ye, Xiaofan Zhang, Zhize Huang, Gengsheng Chen, Deming Chen

To speedup Deep Neural Networks (DNN) accelerator design and enable effective implementation, we propose HybridDNN, a framework for building high-performance hybrid DNN accelerators and delivering FPGA-based hardware implementations.

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

no code implementations27 Feb 2020 Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Hassan Sajjad, Preslav Nakov, Deming Chen, Marianne Winslett

Pre-trained Transformer-based models have achieved state-of-the-art performance for various Natural Language Processing (NLP) tasks.

Model Compression

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation6 Jan 2020 Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations18 Nov 2019 Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

1 code implementation25 Jun 2019 Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

Object Detection

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2 code implementations20 May 2019 Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

Object Detection

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

2 code implementations9 Apr 2019 Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

Object Detection

When CTC Training Meets Acoustic Landmarks

no code implementations5 Nov 2018 Di He, Xuesong Yang, Boon Pang Lim, Yi Liang, Mark Hasegawa-Johnson, Deming Chen

In this paper, the convergence properties of CTC are improved by incorporating acoustic landmarks.

Speech Recognition

Design Flow of Accelerating Hybrid Extremely Low Bit-width Neural Network in Embedded FPGA

no code implementations31 Jul 2018 Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, Deming Chen

To create such accelerators, we propose a design flow for accelerating the extremely low bit-width neural network (ELB-NN) in embedded FPGAs with hybrid quantization schemes.

Edge-computing Quantization

Improved ASR for Under-Resourced Languages Through Multi-Task Learning with Acoustic Landmarks

no code implementations15 May 2018 Di He, Boon Pang Lim, Xuesong Yang, Mark Hasegawa-Johnson, Deming Chen

Furui first demonstrated that the identity of both consonant and vowel can be perceived from the C-V transition; later, Stevens proposed that acoustic landmarks are the primary cues for speech perception, and that steady-state regions are secondary or supplemental.

Multi-Task Learning Speech Recognition

Face Recognition with Hybrid Efficient Convolution Algorithms on FPGAs

no code implementations23 Mar 2018 Chuanhao Zhuge, Xinheng Liu, Xiaofan Zhang, Sudeep Gummadi, JinJun Xiong, Deming Chen

Deep Convolutional Neural Networks have become a Swiss knife in solving critical artificial intelligence tasks.

Face Recognition

CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

10 code implementations CVPR 2018 Yuhong Li, Xiaofan Zhang, Deming Chen

We demonstrate CSRNet on four datasets (ShanghaiTech dataset, the UCF_CC_50 dataset, the WorldEXPO'10 dataset, and the UCSD dataset) and we deliver the state-of-the-art performance.

Crowd Counting Scene Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.