Search Results for author: Amir Yazdanbakhsh

Found 29 papers, 11 papers with code

When Linear Attention Meets Autoregressive Decoding: Towards More Effective and Efficient Linearized Large Language Models

1 code implementation11 Jun 2024 Haoran You, Yichao Fu, Zheng Wang, Amir Yazdanbakhsh, Yingyan, Lin

Autoregressive Large Language Models (LLMs) have achieved impressive performance in language tasks but face two significant bottlenecks: (1) quadratic complexity in the attention module as the number of tokens increases, and (2) limited efficiency due to the sequential processing nature of autoregressive LLMs during generation.

Decoder

ShiftAddLLM: Accelerating Pretrained LLMs via Post-Training Multiplication-Less Reparameterization

1 code implementation10 Jun 2024 Haoran You, Yipin Guo, Yichao Fu, Wei Zhou, Huihong Shi, Xiaofan Zhang, Souvik Kundu, Amir Yazdanbakhsh, Yingyan, Lin

Experiments on five LLM families and eight tasks consistently validate the effectiveness of ShiftAddLLM, achieving average perplexity improvements of 5. 6 and 22. 7 points at comparable or lower latency compared to the most competitive quantized LLMs at 3 and 2 bits, respectively, and more than 80% memory and energy reductions over the original LLMs.

Effective Interplay between Sparsity and Quantization: From Theory to Practice

no code implementations31 May 2024 Simla Burcu Harma, Ayan Chakraborty, Elizaveta Kostenok, Danila Mishin, Dongho Ha, Babak Falsafi, Martin Jaggi, Ming Liu, Yunho Oh, Suvinay Subramanian, Amir Yazdanbakhsh

In addition, through rigorous analysis, we demonstrate that sparsity and quantization are not orthogonal; their interaction can significantly harm model accuracy, with quantization error playing a dominant role in this degradation.

Computational Efficiency Model Compression +1

SLoPe: Double-Pruned Sparse Plus Lazy Low-Rank Adapter Pretraining of LLMs

no code implementations25 May 2024 Mohammad Mozaffari, Amir Yazdanbakhsh, Zhao Zhang, Maryam Mehri Dehnavi

We propose SLoPe, a Double-Pruned Sparse Plus Lazy Low-rank Adapter Pretraining method for LLMs that improves the accuracy of sparse LLMs while accelerating their pretraining and inference and reducing their memory footprint.

Tao: Re-Thinking DL-based Microarchitecture Simulation

no code implementations16 Apr 2024 Santosh Pandey, Amir Yazdanbakhsh, Hang Liu

Microarchitecture simulators are indispensable tools for microarchitecture designers to validate, estimate, and optimize new hardware that meets specific design requirements.

Transfer Learning

DaCapo: Accelerating Continuous Learning in Autonomous Systems for Video Analytics

1 code implementation21 Mar 2024 Yoonsung Kim, Changhun Oh, Jinwoo Hwang, Wonung Kim, Seongryong Oh, Yubin Lee, Hardik Sharma, Amir Yazdanbakhsh, Jongse Park

Deep neural network (DNN) video analytics is crucial for autonomous systems such as self-driving vehicles, unmanned aerial vehicles (UAVs), and security robots.

Progressive Gradient Flow for Robust N:M Sparsity Training in Transformers

1 code implementation7 Feb 2024 Abhimanyu Rajeshkumar Bambhaniya, Amir Yazdanbakhsh, Suvinay Subramanian, Sheng-Chun Kao, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we study the effectiveness of existing sparse training recipes at \textit{high-sparsity regions} and argue that these methods fail to sustain the model quality on par with low-sparsity regions.

Learning Performance-Improving Code Edits

2 code implementations15 Feb 2023 Alexander Shypula, Aman Madaan, Yimeng Zeng, Uri Alon, Jacob Gardner, Milad Hashemi, Graham Neubig, Parthasarathy Ranganathan, Osbert Bastani, Amir Yazdanbakhsh

Next, we propose a broad range of adaptation strategies for code optimization; for prompting, these include retrieval-based few-shot prompting and chain-of-thought, and for finetuning, these include performance-conditioned generation and synthetic data augmentation based on self-play.

Code Generation Code Repair +2

STEP: Learning N:M Structured Sparsity Masks from Scratch with Precondition

no code implementations2 Feb 2023 Yucheng Lu, Shivani Agrawal, Suvinay Subramanian, Oleg Rybakov, Christopher De Sa, Amir Yazdanbakhsh

Recent innovations on hardware (e. g. Nvidia A100) have motivated learning N:M structured sparsity masks from scratch for fast model inference.

Machine Translation

GRANITE: A Graph Neural Network Model for Basic Block Throughput Estimation

1 code implementation8 Oct 2022 Ondrej Sykora, Phitchaya Mangpo Phothilimthana, Charith Mendis, Amir Yazdanbakhsh

In this paper, we introduce GRANITE, a new machine learning model that estimates the throughput of basic blocks across different microarchitectures.

Decoder Graph Neural Network +1

Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango

no code implementations16 Sep 2022 Aman Madaan, Amir Yazdanbakhsh

Our empirical and qualitative analysis reveals that a symbiotic relationship between text and patterns explains the success of few-shot prompting: text helps extract commonsense from the question to help patterns, and patterns enforce task understanding and direct text generation.

Code Completion counterfactual +1

Training Recipe for N:M Structured Sparsity with Decaying Pruning Mask

no code implementations15 Sep 2022 Sheng-Chun Kao, Amir Yazdanbakhsh, Suvinay Subramanian, Shivani Agrawal, Utku Evci, Tushar Krishna

In this work, we focus on N:M sparsity and extensively study and evaluate various training recipes for N:M sparsity in terms of the trade-off between model accuracy and compute cost (FLOPs).

Sparse Attention Acceleration with Synergistic In-Memory Pruning and On-Chip Recomputation

no code implementations1 Sep 2022 Amir Yazdanbakhsh, Ashkan Moradifirouzabadi, Zheng Li, Mingu Kang

The combined in-memory pruning and on-chip recompute of the relevant attention scores enables SPRINT to transform quadratic complexity to a merely linear one.

Accelerating Attention through Gradient-Based Learned Runtime Pruning

no code implementations7 Apr 2022 Zheng Li, Soroush Ghodrati, Amir Yazdanbakhsh, Hadi Esmaeilzadeh, Mingu Kang

To best utilize this mathematical innovation, we devise a bit-serial architecture, dubbed LeOPArd, for transformer language models with bit-level early termination microarchitectural mechanism.

Sentence

Data-Driven Offline Optimization For Architecting Hardware Accelerators

1 code implementation ICLR 2022 Aviral Kumar, Amir Yazdanbakhsh, Milad Hashemi, Kevin Swersky, Sergey Levine

An alternative paradigm is to use a "data-driven", offline approach that utilizes logged simulation data, to architect hardware accelerators, without needing any form of simulations.

Computer Architecture and Systems

An Evaluation of Edge TPU Accelerators for Convolutional Neural Networks

1 code implementation20 Feb 2021 Kiran Seshadri, Berkin Akin, James Laudon, Ravi Narayanaswami, Amir Yazdanbakhsh

Then, we extensively evaluate three classes of Edge TPUs, covering different computing ecosystems, that are either currently deployed in Google products or are the product pipeline, across 423K unique convolutional neural networks.

Apollo: Transferable Architecture Exploration

no code implementations2 Feb 2021 Amir Yazdanbakhsh, Christof Angermueller, Berkin Akin, Yanqi Zhou, Albin Jones, Milad Hashemi, Kevin Swersky, Satrajit Chatterjee, Ravi Narayanaswami, James Laudon

We further show that by transferring knowledge between target architectures with different design constraints, Apollo is able to find optimal configurations faster and often with better objective value (up to 25% improvements).

Chameleon: Adaptive Code Optimization for Expedited Deep Neural Network Compilation

1 code implementation ICLR 2020 Byung Hoon Ahn, Prannoy Pilligundla, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

This solution dubbed Chameleon leverages reinforcement learning whose solution takes fewer steps to converge, and develops an adaptive sampling algorithm that not only focuses on the costly samples (real hardware measurements) on representative points but also uses a domain-knowledge inspired logic to improve the samples itself.

Diversity

Policy Optimization by Local Improvement through Search

no code implementations25 Sep 2019 Jialin Song, Joe Wenjie Jiang, Amir Yazdanbakhsh, Ebrahim Songhori, Anna Goldie, Navdeep Jaitly, Azalia Mirhoseini

On the other end of the spectrum, approaches rooted in Policy Iteration, such as Dual Policy Iteration do not choose next step actions based on an expert, but instead use planning or search over the policy to choose an action distribution to train towards.

Imitation Learning reinforcement-learning +1

Mixed-Signal Charge-Domain Acceleration of Deep Neural networks through Interleaved Bit-Partitioned Arithmetic

no code implementations27 Jun 2019 Soroush Ghodrati, Hardik Sharma, Sean Kinzer, Amir Yazdanbakhsh, Kambiz Samadi, Nam Sung Kim, Doug Burger, Hadi Esmaeilzadeh

Low-power potential of mixed-signal design makes it an alluring option to accelerate Deep Neural Networks (DNNs).

Hardware Architecture

ReLeQ: A Reinforcement Learning Approach for Deep Quantization of Neural Networks

no code implementations5 Nov 2018 Ahmed T. Elthakeb, Prannoy Pilligundla, FatemehSadat Mireshghallah, Amir Yazdanbakhsh, Hadi Esmaeilzadeh

We show how ReLeQ can balance speed and quality, and provide an asymmetric general solution for quantization of a large variety of deep networks (AlexNet, CIFAR-10, LeNet, MobileNet-V1, ResNet-20, SVHN, and VGG-11) that virtually preserves the accuracy (=< 0. 3% loss) while minimizing the computation and storage cost.

Quantization reinforcement-learning +1

GANAX: A Unified MIMD-SIMD Acceleration for Generative Adversarial Networks

no code implementations10 May 2018 Amir Yazdanbakhsh, Hajar Falahati, Philip J. Wolfe, Kambiz Samadi, Nam Sung Kim, Hadi Esmaeilzadeh

Even though there is a convolution stage in this operator, the inserted zeros lead to underutilization of the compute resources when a conventional convolution accelerator is employed.

Cannot find the paper you are looking for? You can Submit a new open access paper.