no code implementations • 29 May 2023 • Arash Ardakani, Altan Haan, Shangyin Tan, Doru Thom Popovici, Alvin Cheung, Costin Iancu, Koushik Sen
This allows SlimFit to freeze up to 95% of layers and reduce the overall on-device GPU memory usage of transformer-based models such as ViT and BERT by an average of 2. 2x, across different NLP and CV benchmarks/datasets such as GLUE, SQuAD 2. 0, CIFAR-10, CIFAR-100 and ImageNet with an average degradation of 0. 2% in accuracy.
no code implementations • 24 Feb 2022 • Amir Ardakani, Arash Ardakani, Brett Meyer, James J. Clark, Warren J. Gross
Quantization of deep neural networks is a promising approach that reduces the inference cost, making it feasible to run deep networks on resource-restricted devices.
no code implementations • NeurIPS 2020 • Arash Ardakani, Amir Ardakani, Warren Gross
Therefore, our FSM-based model can learn extremely long-term dependencies as it requires 1/l memory storage during training compared to LSTMs, where l is the number of time steps.
no code implementations • NeurIPS 2019 • Arash Ardakani, Zhengyun Ji, Amir Ardakani, Warren Gross
The emergence of XNOR networks seek to reduce the model size and computational cost of neural networks for their deployment on specialized hardware requiring real-time processes with limited hardware resources.
no code implementations • 9 Nov 2018 • Arash Ardakani, Zhengyun Ji, Warren J. Gross
This observation suggests that a large fraction of the recurrent computations are ineffectual and can be avoided to speed up the process during the inference as they involve noncontributory multiplications/accumulations with zero-valued states.
1 code implementation • ICLR 2019 • Arash Ardakani, Zhengyun Ji, Sean C. Smithson, Brett H. Meyer, Warren J. Gross
On the software side, we evaluate the performance (in terms of accuracy) of our method using long short-term memories (LSTMs) on various sequential models including sequence classification and language modeling.
no code implementations • 11 Dec 2017 • Arash Ardakani, Carlo Condo, Warren J. Gross
Their performance efficiency is limited to less than 55% on average, which leads to unnecessarily high processing latency and silicon area.
Hardware Architecture
no code implementations • 4 Nov 2016 • Arash Ardakani, Carlo Condo, Warren J. Gross
The proposed architecture can save up to 90% of memory compared to the conventional implementations of fully-connected neural networks.
no code implementations • 29 Sep 2015 • Arash Ardakani, François Leduc-Primeau, Naoya Onizawa, Takahiro Hanyu, Warren J. Gross
We also synthesize the circuits in a 65 nm CMOS technology and we show that the proposed integral stochastic architecture results in up to 21% reduction in energy consumption compared to the binary radix implementation at the same misclassification rate.