Search Results for author: Arash Ardakani

Found 9 papers, 1 papers with code

SlimFit: Memory-Efficient Fine-Tuning of Transformer-based Models Using Training Dynamics

no code implementations29 May 2023 Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen

This allows SlimFit to freeze up to 95% of layers and reduce the overall on-device GPU memory usage of transformer-based models such as ViT and BERT by an average of 2. 2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2. 0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0. 2% in accuracy.

Quantization Scheduling

Standard Deviation-Based Quantization for Deep Neural Networks

no code implementations24 Feb 2022 Amir Ardakani, Arash Ardakani, Brett Meyer, James J. Clark, Warren J. Gross

Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices.

Quantization

Training Linear Finite-State Machines

no code implementations NeurIPS 2020 Arash Ardakani, Amir Ardakani, Warren Gross

Therefore, our FSM-based model can learn extremely long-term dependencies as it requires 1/l memory storage during training compared to LSTMs, where l is the number of time steps.

Language Modelling Time Series Analysis

The Synthesis of XNOR Recurrent Neural Networks with Stochastic Logic

no code implementations NeurIPS 2019 Arash Ardakani, Zhengyun Ji, Amir Ardakani, Warren Gross

The emergence of XNOR networks seek to reduce the model size and computational cost of neural networks for their deployment on specialized hardware requiring real-time processes with limited hardware resources.

Quantization

Learning to Skip Ineffectual Recurrent Computations in LSTMs

no code implementations9 Nov 2018 Arash Ardakani, Zhengyun Ji, Warren J. Gross

This observation suggests that a large fraction of the recurrent computations are ineffectual and can be avoided to speed up the process during the inference as they involve noncontributory multiplications/accumulations with zero-valued states.

Learning Recurrent Binary/Ternary Weights

1 code implementation ICLR 2019 Arash Ardakani, Zhengyun Ji, Sean C. Smithson, Brett H. Meyer, Warren J. Gross

On the software side, we evaluate the performance (in terms of accuracy) of our method using long short-term memories (LSTMs) on various sequential models including sequence classification and language modeling.

Language Modelling

Multi-Mode Inference Engine for Convolutional Neural Networks

no code implementations11 Dec 2017 Arash Ardakani, Carlo Condo, Warren J. Gross

Their performance efficiency is limited to less than 55% on average, which leads to unnecessarily high processing latency and silicon area.

Hardware Architecture

Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks

no code implementations4 Nov 2016 Arash Ardakani, Carlo Condo, Warren J. Gross

The proposed architecture can save up to 90% of memory compared to the conventional implementations of fully-connected neural networks.

VLSI Implementation of Deep Neural Network Using Integral Stochastic Computing

no code implementations29 Sep 2015 Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, Warren J. Gross

We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate.

Cannot find the paper you are looking for? You can Submit a new open access paper.