1 code implementation • 15 Nov 2024 • Yonggan Fu, Zhongzhi Yu, Junwei Li, Jiayi Qian, Yongan Zhang, Xiangchi Yuan, Dachuan Shi, Roman Yakunin, Yingyan Celine Lin
Motivated by the transformative capabilities of large language models (LLMs) across various natural language tasks, there has been a growing demand to deploy these models effectively across diverse real-world applications and platforms.
no code implementations • 11 Jul 2024 • Zheng Wang, Boxiao Jin, Zhongzhi Yu, Minjia Zhang
To mitigate computational costs, LLMs often employ the KV Cache technique to improve the generation speed.
1 code implementation • 2 Jul 2024 • Yongan Zhang, Zhongzhi Yu, Yonggan Fu, Cheng Wan, Yingyan Celine Lin
Furthermore, to fully exploit the potential of the MG-Verilog dataset, which varies in complexity and detail, we introduce a balanced fine-tuning scheme.
1 code implementation • 22 Jun 2024 • Zhongzhi Yu, Zheng Wang, Yonggan Fu, Huihong Shi, Khalid Shaikh, Yingyan Celine Lin
Specifically, this work begins with comprehensive visualizations of the attention distributions in LLMs during inference across various inputs and tasks.
1 code implementation • 22 Jun 2024 • Zhongzhi Yu, Zheng Wang, Yuhan Li, Haoran You, Ruijie Gao, Xiaoya Zhou, Sreenidhi Reedy Bommu, Yang Katie Zhao, Yingyan Celine Lin
Efficient adaption of large language models (LLMs) on edge devices is essential for applications requiring continuous and privacy-preserving adaptation and inference.
no code implementations • 19 Sep 2023 • Yonggan Fu, Yongan Zhang, Zhongzhi Yu, Sixu Li, Zhifan Ye, Chaojian Li, Cheng Wan, Yingyan Celine Lin
To our knowledge, this work is the first to demonstrate an effective pipeline for LLM-powered automated AI accelerator generation.
no code implementations • 23 Jun 2023 • Zhongzhi Yu, Yang Zhang, Kaizhi Qian, Yonggan Fu, Yingyan Lin
Despite the impressive performance recently achieved by automatic speech recognition (ASR), we observe two primary challenges that hinder its broader applications: (1) The difficulty of introducing scalability into the model to support more languages with limited training, inference, and storage overhead; (2) The low-resource adaptation ability that enables effective low-resource adaptation while avoiding over-fitting and catastrophic forgetting issues.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 23 Jun 2023 • Zhongzhi Yu, Yonggan Fu, Jiayi Yuan, Haoran You, Yingyan Lin
Tiny deep learning has attracted increasing attention driven by the substantial demand for deploying deep learning on numerous intelligent Internet-of-Things devices.
1 code implementation • CVPR 2023 • Zhongzhi Yu, Shang Wu, Yonggan Fu, Shunyao Zhang, Yingyan Lin
To tackle this challenge, we first identify an opportunity for FViTs in few-shot tuning: pretrained FViTs themselves have already learned highly representative features from large-scale pretraining data, which are fully preserved during widely used parameter-efficient tuning.
1 code implementation • 2 Nov 2022 • Yonggan Fu, Yang Zhang, Kaizhi Qian, Zhifan Ye, Zhongzhi Yu, Cheng-I Lai, Yingyan Celine Lin
We believe S$^3$-Router has provided a new perspective for practical deployment of speech SSL models.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • 18 Oct 2022 • Haoran You, Zhanyi Sun, Huihong Shi, Zhongzhi Yu, Yang Zhao, Yongan Zhang, Chaojian Li, Baopu Li, Yingyan Celine Lin
Specifically, on the algorithm level, ViTCoD prunes and polarizes the attention maps to have either denser or sparser fixed patterns for regularizing two levels of workloads without hurting the accuracy, largely reducing the attention computations while leaving room for alleviating the remaining dominant data movements; on top of that, we further integrate a lightweight and learnable auto-encoder module to enable trading the dominant high-cost data movements for lower-cost computations.
no code implementations • 15 Mar 2022 • Zhongzhi Yu, Yonggan Fu, Shang Wu, Mengquan Li, Haoran You, Yingyan Lin
While existing works mostly fix the model precision during the whole training process, a few pioneering works have shown that dynamic precision schedules help DNNs converge to a better accuracy while leading to a lower training cost than their static precision training counterparts.
no code implementations • 21 Dec 2021 • Zhongzhi Yu, Yonggan Fu, Sicheng Li, Chaojian Li, Yingyan Lin
ViTs are often too computationally expensive to be fitted onto real-world resource-constrained devices, due to (1) their quadratically increased complexity with the number of input tokens and (2) their overparameterized self-attention heads and model depth.
no code implementations • 24 Aug 2021 • Gang Yu, Zhongzhi Yu, Yemin Shi, Yingshuo Wang, Xiaoqing Liu, Zheming Li, Yonggen Zhao, Fenglei Sun, Yizhou Yu, Qiang Shu
The first stage structuralizes test results by extracting relevant numerical values from clinical notes, and the disease identification stage provides a diagnosis based on text-form clinical notes and the structured data obtained from the first stage.
no code implementations • 17 Aug 2021 • Mengquan Li, Zhongzhi Yu, Yongan Zhang, Yonggan Fu, Yingyan Lin
The recent breakthroughs and prohibitive complexities of Deep Neural Networks (DNNs) have excited extensive interest in domain-specific DNN accelerators, among which optical DNN accelerators are particularly promising thanks to their unprecedented potential of achieving superior performance-per-watt.
no code implementations • 11 Jun 2021 • Yonggan Fu, Yongan Zhang, Chaojian Li, Zhongzhi Yu, Yingyan Celine Lin
Driven by the explosive interest in applying deep reinforcement learning (DRL) agents to numerous real-time control and decision-making applications, there has been a growing demand to deploy DRL agents to empower daily-life intelligent devices, while the prohibitive complexity of DRL stands at odds with limited on-device resources.
1 code implementation • 22 Apr 2021 • Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yifan Jiang, Chaojian Li, Yongyuan Liang, Mingchao Jiang, Zhangyang Wang, Yingyan Celine Lin
The promise of Deep Neural Network (DNN) powered Internet of Thing (IoT) devices has motivated a tremendous demand for automated solutions to enable fast development and deployment of efficient (1) DNNs equipped with instantaneous accuracy-efficiency trade-off capability to accommodate the time-varying resources at IoT devices and (2) dataflows to optimize DNNs' execution efficiency on different devices.
1 code implementation • 19 Mar 2021 • Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Yingyan Lin
To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance of all the networks in the search spaces of both NAS-Bench-201 and FBNet, on six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).
Hardware Aware Neural Architecture Search
Neural Architecture Search
no code implementations • ICLR 2021 • Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin
To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e. g., energy cost and latency) of all the networks in the search space of both NAS-Bench-201 and FBNet, considering six hardware devices that fall into three categories (i. e., commercial edge devices, FPGA, and ASIC).
Hardware Aware Neural Architecture Search
Neural Architecture Search
no code implementations • 24 Dec 2020 • Yonggan Fu, Zhongzhi Yu, Yongan Zhang, Yingyan Celine Lin
We therefore propose an Auto-Agent-Distiller (A2D) framework, which to our best knowledge is the first neural architecture search (NAS) applied to DRL to automatically search for the optimal DRL agents for various tasks that optimize both the test scores and efficiency.
no code implementations • 11 Mar 2020 • Zhongzhi Yu, Yemin Shi, Tiejun Huang, Yizhou Yu
Thus, KQ can represent the weight tensor in the convolution layer with low-bit indexes and a kernel codebook with limited size, which enables KQ to achieve significant compression ratio.
no code implementations • SEMEVAL 2018 • Qiang Ning, Zhongzhi Yu, Chuchu Fan, Dan Roth
As a result, only a small number of documents are typically annotated, limiting the coverage of various lexical/semantic phenomena.
no code implementations • 18 Apr 2018 • Qiang Ning, Zhongzhi Yu, Chuchu Fan, Dan Roth
As a result, only a small number of documents are typically annotated, limiting the coverage of various lexical/semantic phenomena.