no code implementations • 17 Jul 2024 • Vint Lee, Chun Deng, Leena Elzeiny, Pieter Abbeel, John Wawrzynek
Macro placement is a vital step in digital circuit design that defines the physical location of large collections of components, known as macros, on a 2-dimensional chip.
no code implementations • 5 May 2021 • Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, Yakun Sophia Shao
Recent advances in Deep Neural Networks (DNNs) have led to active development of specialized DNN accelerators, many of which feature a large number of processing elements laid out spatially, together with a multi-level memory hierarchy and flexible interconnect.
no code implementations • 26 Apr 2021 • Zhen Dong, Yizhao Gao, Qijing Huang, John Wawrzynek, Hayden K. H. So, Kurt Keutzer
Automatic algorithm-hardware co-design for DNN has shown great success in improving the performance of DNNs on FPGAs.
Hardware Aware Neural Architecture Search Image Classification +2
3 code implementations • 12 Jun 2020 • Zhen Dong, Dequan Wang, Qijing Huang, Yizhao Gao, Yaohui Cai, Tian Li, Bichen Wu, Kurt Keutzer, John Wawrzynek
Deploying deep learning models on embedded systems has been challenging due to limited computing resources.
no code implementations • 27 May 2020 • Ameer Haj-Ali, Hasan Genc, Qijing Huang, William Moses, John Wawrzynek, Krste Asanović, Ion Stoica
We explore applying the Monte Carlo Tree Search (MCTS) algorithm in a notoriously difficult task: tuning programs for high-performance deep learning and image processing.
1 code implementation • 2 Mar 2020 • Qijing Huang, Ameer Haj-Ali, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek
We compare the performance of AutoPhase to state-of-the-art algorithms that address the phase-ordering problem.
2 code implementations • 19 Feb 2020 • Qijing Huang, Dequan Wang, Yizhao Gao, Yaohui Cai, Zhen Dong, Bichen Wu, Kurt Keutzer, John Wawrzynek
In this work, we first investigate the overhead of the deformable convolution on embedded FPGA SoCs, and then show the accuracy-latency tradeoffs for a set of algorithm modifications including full versus depthwise, fixed-shape, and limited-range.
1 code implementation • 15 Jan 2019 • Ameer Haj-Ali, Qijing Huang, William Moses, John Xiang, Ion Stoica, Krste Asanovic, John Wawrzynek
We implement a framework in the context of the LLVM compiler to optimize the ordering for HLS programs and compare the performance of deep reinforcement learning to state-of-the-art algorithms that address the phase-ordering problem.
1 code implementation • 21 Nov 2018 • Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer
DiracDeltaNet achieves competitive accuracy on ImageNet (88. 7\% top-5), but with 42$\times$ fewer parameters and 48$\times$ fewer OPs than VGG16.
no code implementations • 28 Apr 2017 • Hayden Kwok-Hay So, John Wawrzynek
The 3rd International Workshop on Overlay Architectures for FPGAs (OLAF 2017) was held on 22 Feb, 2017 as a co-located workshop at the 25th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017).
Hardware Architecture C.0; C.1; B.5.2; B.6.3; B.7.2