1 code implementation • ICLR 2022 • Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine
An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations.
1 code implementation • 20 Feb 2021 • Kiran Seshadri, Berkin Akin, James Laudon, Ravi Narayanaswami, Amir Yazdanbakhsh
Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline, across 423K unique convolutional neural networks.
2 code implementations • NeurIPS 2023 • Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, Shashank Gupta, Bodhisattwa Prasad Majumder, Katherine Hermann, Sean Welleck, Amir Yazdanbakhsh, Peter Clark
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
1 code implementation • 27 Apr 2023 • Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh, Shivani Agrawal, Suvinay Subramanian, Xin Wang, Sheng-Chun Kao, Xingyao Zhang, Trevor Gale, Aart Bik, Woohyun Han, Milen Ferev, Zhonglin Han, Hong-Seok Kim, Yann Dauphin, Gintare Karolina Dziugaite, Pablo Samuel Castro, Utku Evci
This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research.
2 code implementations • 15 Feb 2023 • Alexander Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob Gardner, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, Amir Yazdanbakhsh
Next, we propose a broad range of adaptation strategies for code optimization; for prompting, these include retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.
1 code implementation • 8 Oct 2022 • Ondrej Sykora, Phitchaya Mangpo Phothilimthana, Charith Mendis, Amir Yazdanbakhsh
In this paper, we introduce GRANITE, a new machine learning model that estimates the throughput of basic blocks across different microarchitectures.
1 code implementation • 7 Feb 2024 • Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna
In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions.
no code implementations • 10 May 2018 • Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, Kambiz Samadi, Nam Sung Kim, Hadi Esmaeilzadeh
Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed.
no code implementations • 5 Nov 2018 • Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, Hadi Esmaeilzadeh
We show how ReLeQ can balance speed and quality, and provide an asymmetric general solution for quantization of a large variety of deep networks (AlexNet, CIFAR-10, LeNet, MobileNet-V1, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy (=< 0. 3% loss) while minimizing the computation and storage cost.
no code implementations • 27 Jun 2019 • Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh
Low-power potential of mixed-signal design makes it an alluring option to accelerate Deep Neural Networks (DNNs).
Hardware Architecture
no code implementations • ICLR 2020 • Amir Yazdanbakhsh, Ebrahim Songhori, Robert Ormandi, Anna Goldie, Azalia Mirhoseini
In our experiments, we use PPO as our baseline policy optimization algorithm.
1 code implementation • ICLR 2020 • Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh
This solution dubbed Chameleon leverages reinforcement learning whose solution takes fewer steps to converge, and develops an adaptive sampling algorithm that not only focuses on the costly samples (real hardware measurements) on representative points but also uses a domain-knowledge inspired logic to improve the samples itself.
no code implementations • 1 Jan 2021 • Yanqi Zhou, Xuanyi Dong, Daiyi Peng, Ethan Zhu, Amir Yazdanbakhsh, Berkin Akin, Mingxing Tan, James Laudon
In this paper, we study the importance of co-designing neural architectures and hardware accelerators.
no code implementations • 2 Feb 2021 • Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon
We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements).
no code implementations • 17 Feb 2021 • Yanqi Zhou, Xuanyi Dong, Berkin Akin, Mingxing Tan, Daiyi Peng, Tianjian Meng, Amir Yazdanbakhsh, Da Huang, Ravi Narayanaswami, James Laudon
In our work, we target the optimization of hardware and software configurations on an industry-standard edge accelerator.
no code implementations • 13 Jul 2021 • Sheng-Chun Kao, Suvinay Subramanian, Gaurav Agrawal, Amir Yazdanbakhsh, Tushar Krishna
In contrast, FLAT unblocks transformer models for inputs with up to 64K elements
no code implementations • 25 Sep 2019 • Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini
On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.
no code implementations • 7 Apr 2022 • Zheng Li, Soroush Ghodrati, Amir Yazdanbakhsh, Hadi Esmaeilzadeh, Mingu Kang
To best utilize this mathematical innovation, we devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism.
no code implementations • 1 Sep 2022 • Amir Yazdanbakhsh, Ashkan Moradifirouzabadi, Zheng Li, Mingu Kang
The combined in-memory pruning and on-chip recompute of the relevant attention scores enables SPRINT to transform quadratic complexity to a merely linear one.
no code implementations • 16 Sep 2022 • Aman Madaan, Amir Yazdanbakhsh
Our empirical and qualitative analysis reveals that a symbiotic relationship between text and patterns explains the success of few-shot prompting: text helps extract commonsense from the question to help patterns, and patterns enforce task understanding and direct text generation.
no code implementations • 15 Sep 2022 • Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna
In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs).
no code implementations • 2 Feb 2023 • Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh
Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.
no code implementations • 13 Dec 2023 • Shaojin Ding, David Qiu, David Rim, Yanzhang He, Oleg Rybakov, Bo Li, Rohit Prabhavalkar, Weiran Wang, Tara N. Sainath, Zhonglin Han, Jian Li, Amir Yazdanbakhsh, Shivani Agrawal
We conducted extensive experiments with a 2-billion parameter USM on a large-scale voice search dataset to evaluate our proposed method.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 21 Mar 2024 • Yoonsung Kim, Changhun Oh, Jinwoo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park
Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots.
no code implementations • 16 Apr 2024 • Santosh Pandey, Amir Yazdanbakhsh, Hang Liu
Microarchitecture simulators are indispensable tools for microarchitecture designers to validate, estimate, and optimize new hardware that meets specific design requirements.