Search Results for author: Cong Hao

Found 24 papers, 13 papers with code

H2H: Heterogeneous Model to Heterogeneous System Mapping with Computation and Communication Awareness

1 code implementation29 Apr 2022 Xinyi Zhang, Cong Hao, Peipei Zhou, Alex Jones, Jingtong Hu

The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i. e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns.

Multi-Task Learning

FlowGNN: A Dataflow Architecture for Universal Graph Neural Network Inference via Multi-Queue Streaming

1 code implementation27 Apr 2022 Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, Cong Hao

First, we propose a novel and scalable dataflow architecture, which flexibly supports a wide range of GNN models with message-passing mechanism.

Drug Discovery

Hybrid Graph Models for Logic Optimization via Spatio-Temporal Information

no code implementations20 Jan 2022 Nan Wu, Jiwon Lee, Yuan Xie, Cong Hao

Despite the stride made by machine learning (ML) based performance modeling, two major concerns that may impede production-ready ML applications in EDA are stringent accuracy requirements and generalization capability.

GenGNN: A Generic FPGA Framework for Graph Neural Network Acceleration

1 code implementation20 Jan 2022 Stefan Abi-Karam, Yuqi He, Rishov Sarkar, Lakshmi Sathidevi, Zihang Qiao, Cong Hao

Second, we aim to support a diverse set of GNN models with the extensibility to flexibly adapt to new models.

Drug Discovery

Program-to-Circuit: Exploiting GNNs for Program Representation and Circuit Translation

no code implementations13 Sep 2021 Nan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao

Pioneering in this direction, we expect more GNN endeavors to revolutionize this high-demand Program-to-Circuit problem and to enrich the expressiveness of GNNs on programs.

Transfer Learning Translation

Generic Neural Architecture Search via Regression

1 code implementation NeurIPS 2021 Yuhong Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen

Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples.

Image Classification Neural Architecture Search

WinoCNN: Kernel Sharing Winograd Systolic Array for Efficient Convolutional Neural Network Acceleration on FPGAs

1 code implementation9 Jul 2021 Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen

We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency.

Adversarial Graph Augmentation to Improve Graph Contrastive Learning

1 code implementation NeurIPS 2021 Susheel Suresh, Pan Li, Cong Hao, Jennifer Neville

Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data.

Contrastive Learning Self-Supervised Learning

3U-EdgeAI: Ultra-Low Memory Training, Ultra-Low BitwidthQuantization, and Ultra-Low Latency Acceleration

no code implementations11 May 2021 Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao

This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.

Model Compression Quantization

Software/Hardware Co-design for Multi-modal Multi-task Learning in Autonomous Systems

no code implementations8 Apr 2021 Cong Hao, Deming Chen

We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency.

Multi-Task Learning

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

no code implementations25 Mar 2021 Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.

Edge-computing

IronMan: GNN-assisted Design Space Exploration in High-Level Synthesis via Reinforcement Learning

no code implementations16 Feb 2021 Nan Wu, Yuan Xie, Cong Hao

Despite the great success of High-Level Synthesis (HLS) tools, we observe several unresolved challenges: 1) the high-level abstraction of programming styles in HLS sometimes conceals optimization opportunities; 2) existing HLS tools do not provide flexible trade-off (Pareto) solutions among different objectives and constraints; 3) the actual quality of the resulting RTL designs is hard to predict.

reinforcement-learning

HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark

no code implementations ICLR 2021 Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin

To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).

Neural Architecture Search

Improving Random-Sampling Neural Architecture Search by Evolving the Proxy Search Space

1 code implementation1 Jan 2021 Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen

This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.

Image Classification Neural Architecture Search

Effective Algorithm-Accelerator Co-design for AI Solutions on Edge Devices

no code implementations14 Oct 2020 Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen

High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.

VecQ: Minimal Loss DNN Model Compression With Vectorized Weight Quantization

1 code implementation18 May 2020 Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen

Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs.

Model Compression Object Detection +1

EDD: Efficient Differentiable DNN Architecture and Implementation Co-search for Embedded AI Solutions

no code implementations6 May 2020 Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.

Neural Architecture Search

AutoDNNchip: An Automated DNN Chip Predictor and Builder for Both FPGAs and ASICs

1 code implementation6 Jan 2020 Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin

Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.

NAIS: Neural Architecture and Implementation Search and its Applications in Autonomous Driving

no code implementations18 Nov 2019 Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen

The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.

Autonomous Driving

SkyNet: A Champion Model for DAC-SDC on Low Power Object Detection

1 code implementation25 Jun 2019 Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen

Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.

Object Detection

A Bi-Directional Co-Design Approach to Enable Deep Learning on IoT Devices

2 code implementations20 May 2019 Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen

Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.

Object Detection

FPGA/DNN Co-Design: An Efficient Design Methodology for IoT Intelligence on the Edge

2 code implementations9 Apr 2019 Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen

While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.

Object Detection

Cannot find the paper you are looking for? You can Submit a new open access paper.