no code implementations • 10 Oct 2024 • Yaoyao Long, Zhenming Liu, Cong Hao, Farrokh Ayazi
Gyroscopes are crucial for accurate angular velocity measurements in navigation, stabilization, and control systems.
1 code implementation • 10 Aug 2024 • Hanqiu Chen, Xuebin Yao, Pradeep Subedi, Cong Hao
However, as the scale of the edge computing system is getting larger, communication among devices is becoming the bottleneck because of the limited bandwidth of wireless communication leads to large data transfer latency.
1 code implementation • 1 May 2024 • Stefan Abi-Karam, Rishov Sarkar, Allison Seigler, Sean Lowe, Zhigang Wei, Hanqiu Chen, Nanditha Rao, Lizy John, Aman Arora, Cong Hao
HLSFactory has three main stages: 1) a design space expansion stage to elaborate single HLS designs into large design spaces using various optimization directives across multiple vendor tools, 2) a design synthesis stage to execute HLS and FPGA tool flows concurrently across designs, and 3) a data aggregation stage for extracting standardized data into packaged datasets for ML usage.
1 code implementation • 11 Aug 2023 • Stefan Abi-Karam, Rishov Sarkar, Dejia Xu, Zhiwen Fan, Zhangyang Wang, Cong Hao
In this work, we introduce INR-Arch, a framework that transforms the computation graph of an nth-order gradient into a hardware-optimized dataflow architecture.
1 code implementation • 29 Jun 2023 • Hanqiu Chen, Hang Yang, Stephen Fitzmeyer, Cong Hao
This paper introduces Rapid-INR, a novel approach that utilizes INR for encoding and compressing images, thereby accelerating neural network training in computer vision tasks.
1 code implementation • 30 May 2023 • Rishov Sarkar, Hanxue Liang, Zhiwen Fan, Zhangyang Wang, Cong Hao
Computer vision researchers are embracing two promising paradigms: Vision Transformers (ViTs) and Multi-task Learning (MTL), which both show great performance but are computation-intensive, given the quadratic complexity of self-attention in ViT and the need to activate an entire large MTL model for one task.
1 code implementation • 13 Apr 2023 • Hanqiu Chen, Cong Hao
The experiment results demonstrate that DGNN-Booster can achieve a speedup of up to 5. 6x compared to the CPU baseline (6226R), 8. 4x compared to the GPU baseline (A6000) and 2. 1x compared to the FPGA baseline without applying optimizations proposed in this paper.
1 code implementation • 29 Mar 2023 • Stefan Abi-Karam, Cong Hao
Therefore, in this work, we propose GNNBuilder, the first automated, generic, end-to-end GNN accelerator generation framework.
no code implementations • ICCV 2023 • Yuhong Li, Jiajie Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen
We further propose a Discrete Proxy Search (DPS) method to find the optimized training settings for Eproxy with only a handful of benchmarked architectures on the target tasks.
1 code implementation • 26 Oct 2022 • Hanxue Liang, Zhiwen Fan, Rishov Sarkar, Ziyu Jiang, Tianlong Chen, Kai Zou, Yu Cheng, Cong Hao, Zhangyang Wang
However, when deploying MTL onto those real-world systems that are often resource-constrained or latency-sensitive, two prominent challenges arise: (i) during training, simultaneously optimizing all tasks is often difficult due to gradient conflicts across tasks; (ii) at inference, current MTL regimes have to activate nearly the entire model even to just execute a single task.
no code implementations • 16 Oct 2022 • Yimeng Zhang, Akshay Karkal Kamath, Qiucheng Wu, Zhiwen Fan, Wuyang Chen, Zhangyang Wang, Shiyu Chang, Sijia Liu, Cong Hao
In this paper, we propose a data-model-hardware tri-design framework for high-throughput, low-cost, and high-accuracy multi-object tracking (MOT) on High-Definition (HD) video stream.
no code implementations • 8 Oct 2022 • Hanqiu Chen, Yahya Alhinai, Yihan Jiang, Eunjee Na, Cong Hao
A variety of dynamic graph neural networks designed from algorithmic perspectives have succeeded in incorporating temporal information into graph processing.
1 code implementation • 13 Jul 2022 • Haoyu Wang, Nan Wu, Hang Yang, Cong Hao, Pan Li
Using machine learning to solve combinatorial optimization (CO) problems is challenging, especially when the data is unlabeled.
no code implementations • 8 Jun 2022 • Qing Lu, Xiaowei Xu, Shunjie Dong, Cong Hao, Lei Yang, Cheng Zhuo, Yiyu Shi
Accurately segmenting temporal frames of cine magnetic resonance imaging (MRI) is a crucial step in various real-time MRI guided cardiac interventions.
no code implementations • 6 Jun 2022 • Xiaofan Zhang, Yao Chen, Cong Hao, Sitao Huang, Yuhong Li, Deming Chen
Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc.
1 code implementation • 29 Apr 2022 • Xinyi Zhang, Cong Hao, Peipei Zhou, Alex Jones, Jingtong Hu
The heterogeneity in ML models comes from multi-sensor perceiving and multi-task learning, i. e., multi-modality multi-task (MMMT), resulting in diverse deep neural network (DNN) layers and computation patterns.
1 code implementation • 27 Apr 2022 • Rishov Sarkar, Stefan Abi-Karam, Yuqi He, Lakshmi Sathidevi, Cong Hao
First, we propose a novel and scalable dataflow architecture, which generally supports a wide range of GNN models with message-passing mechanism.
1 code implementation • 20 Jan 2022 • Stefan Abi-Karam, Yuqi He, Rishov Sarkar, Lakshmi Sathidevi, Zihang Qiao, Cong Hao
Second, we aim to support a diverse set of GNN models with the extensibility to flexibly adapt to new models.
1 code implementation • 20 Jan 2022 • Nan Wu, Jiwon Lee, Yuan Xie, Cong Hao
Despite the stride made by machine learning (ML) based performance modeling, two major concerns that may impede production-ready ML applications in EDA are stringent accuracy requirements and generalization capability.
no code implementations • 18 Jan 2022 • Nan Wu, Hang Yang, Yuan Xie, Pan Li, Cong Hao
The contribution of this work is three-fold.
no code implementations • 13 Sep 2021 • Nan Wu, Huake He, Yuan Xie, Pan Li, Cong Hao
Pioneering in this direction, we expect more GNN endeavors to revolutionize this high-demand Program-to-Circuit problem and to enrich the expressiveness of GNNs on programs.
2 code implementations • NeurIPS 2021 • Yuhong Li, Cong Hao, Pan Li, JinJun Xiong, Deming Chen
Such a self-supervised regression task can effectively evaluate the intrinsic power of an architecture to capture and transform the input signal patterns, and allow more sufficient usage of training samples.
Ranked #1 on
Neural Architecture Search
on NAS-Bench-101
(Spearman Correlation metric)
1 code implementation • 9 Jul 2021 • Xinheng Liu, Yao Chen, Cong Hao, Ashutosh Dhar, Deming Chen
We implement our proposed accelerator on multiple FPGAs, which outperforms the state-of-the-art designs in terms of both throughput and DSP efficiency.
1 code implementation • NeurIPS 2021 • Susheel Suresh, Pan Li, Cong Hao, Jennifer Neville
Self-supervised learning of graph neural networks (GNN) is in great need because of the widespread label scarcity issue in real-world graph/network data.
no code implementations • 11 May 2021 • Yao Chen, Cole Hawkins, Kaiqi Zhang, Zheng Zhang, Cong Hao
This paper emphasizes the importance and efficacy of training, quantization and accelerator design, and calls for more research breakthroughs in the area for AI on the edge.
no code implementations • 8 Apr 2021 • Cong Hao, Deming Chen
We formulate the MMMT model and heterogeneous hardware implementation co-design as a differentiable optimization problem, with the objective of improving the solution quality and reducing the overall power consumption and critical path latency.
no code implementations • 25 Mar 2021 • Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen
Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.
no code implementations • 16 Feb 2021 • Nan Wu, Yuan Xie, Cong Hao
Despite the great success of High-Level Synthesis (HLS) tools, we observe several unresolved challenges: 1) the high-level abstraction of programming styles in HLS sometimes conceals optimization opportunities; 2) existing HLS tools do not provide flexible trade-off (Pareto) solutions among different objectives and constraints; 3) the actual quality of the resulting RTL designs is hard to predict.
no code implementations • ICLR 2021 • Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin
To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).
Hardware Aware Neural Architecture Search
Neural Architecture Search
1 code implementation • 1 Jan 2021 • Yuhong Li, Cong Hao, Xiaofan Zhang, JinJun Xiong, Wen-mei Hwu, Deming Chen
This raises the question of whether we can find an effective proxy search space (PS) that is only a small subset of GS to dramatically improve RandomNAS’s search efficiency while at the same time keeping a good correlation for the top-performing architectures.
no code implementations • 14 Oct 2020 • Cong Hao, Yao Chen, Xiaofan Zhang, Yuhong Li, JinJun Xiong, Wen-mei Hwu, Deming Chen
High quality AI solutions require joint optimization of AI algorithms, such as deep neural networks (DNNs), and their hardware accelerators.
1 code implementation • 18 May 2020 • Cheng Gong, Yao Chen, Ye Lu, Tao Li, Cong Hao, Deming Chen
Quantization has been proven to be an effective method for reducing the computing and/or storage cost of DNNs.
no code implementations • 6 May 2020 • Yuhong Li, Cong Hao, Xiaofan Zhang, Xinheng Liu, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen
We formulate the co-search problem by fusing DNN search variables and hardware implementation variables into one solution space, and maximize both algorithm accuracy and hardware implementation quality.
1 code implementation • 6 Jan 2020 • Pengfei Xu, Xiaofan Zhang, Cong Hao, Yang Zhao, Yongan Zhang, Yue Wang, Chaojian Li, Zetong Guan, Deming Chen, Yingyan Lin
Specifically, AutoDNNchip consists of two integrated enablers: (1) a Chip Predictor, built on top of a graph-based accelerator representation, which can accurately and efficiently predict a DNN accelerator's energy, throughput, and area based on the DNN model parameters, hardware configuration, technology-based IPs, and platform constraints; and (2) a Chip Builder, which can automatically explore the design space of DNN chips (including IP selection, block configuration, resource balancing, etc.
no code implementations • 18 Nov 2019 • Cong Hao, Yao Chen, Xinheng Liu, Atif Sarwari, Daryl Sew, Ashutosh Dhar, Bryan Wu, Dongdong Fu, JinJun Xiong, Wen-mei Hwu, Junli Gu, Deming Chen
The rapidly growing demands for powerful AI algorithms in many application domains have motivated massive investment in both high-quality deep neural network (DNN) models and high-efficiency implementations.
2 code implementations • 20 Sep 2019 • Xiaofan Zhang, Haoming Lu, Cong Hao, Jiachen Li, Bowen Cheng, Yuhong Li, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen
Object detection and tracking are challenging tasks for resource-constrained embedded systems.
2 code implementations • 25 Jun 2019 • Xiaofan Zhang, Cong Hao, Haoming Lu, Jiachen Li, Yuhong Li, Yuchen Fan, Kyle Rupnow, JinJun Xiong, Thomas Huang, Honghui Shi, Wen-mei Hwu, Deming Chen
Developing artificial intelligence (AI) at the edge is always challenging, since edge devices have limited computation capability and memory resources but need to meet demanding requirements, such as real-time processing, high throughput performance, and high inference accuracy.
3 code implementations • 20 May 2019 • Xiaofan Zhang, Cong Hao, Yuhong Li, Yao Chen, JinJun Xiong, Wen-mei Hwu, Deming Chen
Developing deep learning models for resource-constrained Internet-of-Things (IoT) devices is challenging, as it is difficult to achieve both good quality of results (QoR), such as DNN model inference accuracy, and quality of service (QoS), such as inference latency, throughput, and power consumption.
3 code implementations • 9 Apr 2019 • Cong Hao, Xiaofan Zhang, Yuhong Li, Sitao Huang, JinJun Xiong, Kyle Rupnow, Wen-mei Hwu, Deming Chen
While embedded FPGAs are attractive platforms for DNN acceleration on edge-devices due to their low latency and high energy efficiency, the scarcity of resources of edge-scale FPGA devices also makes it challenging for DNN deployment.