In-memory computing (IMC) on a monolithic chip for deep learning faces dramatic challenges on area, yield, and on-chip interconnection cost due to the ever-increasing model sizes.
In this technique, we use analytical models of NoC to evaluate end-to-end communication latency of any given DNN.
Apart from recovering the inference accuracy, our RA-BNN after growing also shows significantly higher resistance to BFA.
We propose a hybrid in-memory computing (HIC) architecture for the training of DNNs on hardware accelerators that results in memory-efficient inference and outperforms baseline software accuracy in benchmark tasks.
1 code implementation • 10 Mar 2020 • Colby R. Banbury, Vijay Janapa Reddi, Max Lam, William Fu, Amin Fazel, Jeremy Holleman, Xinyuan Huang, Robert Hurtado, David Kanter, Anton Lokhmotov, David Patterson, Danilo Pau, Jae-sun Seo, Jeff Sieracki, Urmish Thakker, Marian Verhelst, Poonam Yadav
In this position paper, we present the current landscape of TinyML and discuss the challenges and direction towards developing a fair and useful hardware benchmark for TinyML workloads.
In the case of ResNet50 on ImageNet, this comes to the winning ticket of 75:02% Top-1 accuracy at 80% pruning rate in only 22% of the total epochs for iterative pruning.
In this work, we demonstrate a scalable RRAM based in-memory computing design, termed XNOR-RRAM, which is fabricated in a 90nm CMOS technology with monolithic integration of RRAM devices between metal 1 and 2.
Training of convolutional neural networks (CNNs)on embedded platforms to support on-device learning is earning vital importance in recent days.
Over a suite of six datasets we trained models via transfer learning with an accuracy loss of $<1\%$ resulting in up to 11. 2 TOPS/W - nearly $2 \times$ more efficient than a conventional programmable CNN accelerator of the same area.
no code implementations • 23 May 2018 • Chetan Singh Thakur, Jamal Molin, Gert Cauwenberghs, Giacomo Indiveri, Kundan Kumar, Ning Qiao, Johannes Schemmel, Runchun Wang, Elisabetta Chicca, Jennifer Olson Hasler, Jae-sun Seo, Shimeng Yu, Yu Cao, André van Schaik, Ralph Etienne-Cummings
Neuromorphic engineering (NE) encompasses a diverse range of approaches to information processing that are inspired by neurobiological systems, and this feature distinguishes neuromorphic systems from conventional computing systems.
Deep learning algorithms have shown tremendous success in many recognition tasks; however, these algorithms typically include a deep neural network (DNN) structure and a large number of parameters, which makes it challenging to implement them on power/area-constrained embedded platforms.
We present a new back propagation based training algorithm for discrete-time spiking neural networks (SNN).
Many-core GPU architectures show superior performance but they consume high power and also have memory constraints due to inconsistencies between cache and main memory.
In this paper, we propose a method to compress deep neural networks by using the Fisher Information metric, which we estimate through a stochastic optimization method that keeps track of second-order information in the network.