Afterward, we discuss how to further improve the performance (latency) and the energy efficiency of Edge AI systems through HW/SW-level optimizations, such as pruning, quantization, and approximation.
We are successful in identifying Pareto-optimal designs, which can reduce the storage overhead of the DNN by ~30MB for a quality loss of less than 0. 5%.
Spiking Neural Networks (SNNs) aim at providing energy-efficient learning capabilities when implemented on neuromorphic chips with event-based Dynamic Vision Sensors (DVS).
Since recent works still focus on the fault-modeling and random fault injection in SNNs, the impact of memory faults in SNN hardware architectures on accuracy and the respective fault-mitigation techniques are not thoroughly explored.
A prominent technique for reducing the memory footprint of Spiking Neural Networks (SNNs) without decreasing the accuracy significantly is quantization.
Spiking Neural Networks (SNNs), despite being energy-efficient when implemented on neuromorphic hardware and coupled with event-based Dynamic Vision Sensors (DVS), are vulnerable to security threats, such as adversarial attacks, i. e., small perturbations added to the input for inducing a misclassification.
Our best experiment achieves an accuracy on offline implementation of 86%, that drops to 83% when it is ported onto the Loihi Chip.
Continual learning is essential for all real-world applications, as frozen pre-trained models cannot effectively deal with non-stationary data distributions.
From tiny pacemaker chips to aircraft collision avoidance systems, the state-of-the-art Cyber-Physical Systems (CPS) have increasingly started to rely on Deep Neural Networks (DNNs).
Spiking Neural Networks (SNNs) bear the potential of efficient unsupervised and continual learning capabilities because of their biological plausibility, but their complexity still poses a serious research challenge to enable their energy-efficient design for resource-constrained scenarios (like embedded systems, IoT-Edge, etc.).
The key mechanisms of SparkXD are: (1) improving the SNN error tolerance through fault-aware training that considers bit errors from approximate DRAM, (2) analyzing the error tolerance of the improved SNN model to find the maximum tolerable bit error rate (BER) that meets the targeted accuracy constraint, and (3) energy-efficient DRAM data mapping for the resilient SNN model that maps the weights in the appropriate DRAM location to minimize the DRAM access energy.
We propose DNN-Life, a specialized aging analysis and mitigation framework for DNNs, which jointly exploits hardware- and software-level knowledge to improve the lifetime of a DNN weight memory with reduced energy overhead.
Quantization Hardware Architecture
Machine Learning (ML) techniques have been rapidly adopted by smart Cyber-Physical Systems (CPS) and Internet-of-Things (IoT) due to their powerful decision-making capabilities.
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life.
Training of the policy is supported by Machine Learning-based analytical models for quick performance estimation, thereby drastically reducing the time spent for dynamic profiling.
We thoroughly study SNNs security under different adversarial attacks in the strong white-box setting, with different noise budgets and under variable spiking parameters.
To reduce the overhead of data acquisition, we propose a single power-port current acquisition block using current sensors in time-division multiplexing, which increases accuracy while incurring reduced area overhead.
We analyze the corresponding on-chip memory requirements and leverage it to propose a novel methodology to explore different scratchpad memory designs and their energy/area trade-offs.
Deep Neural Networks (DNNs) have made significant improvements to reach the desired accuracy to be employed in a wide variety of Machine Learning (ML) applications.
FSpiNN reduces the computational requirements by reducing the number of neuronal operations, the STDP-based synaptic weight updates, and the STDP complexity.
Due to their proven efficiency, machine-learning systems are deployed in a wide range of complex real-life problems.
Towards the conversion from a DNN to an SNN, we perform a comprehensive analysis of such process, specifically designed for Intel Loihi, showing our methodology for the design of an SNN that achieves nearly the same accuracy results as its corresponding DNN.
Many convolutional neural network (CNN) accelerators face performance- and energy-efficiency challenges which are crucial for embedded implementations, due to high DRAM access latency and energy.
Capsule Networks (CapsNets), recently proposed by the Google Brain team, have superior learning capabilities in machine learning tasks, like image classification, compared to the traditional CNNs.
With a constant improvement in the network architectures and training methodologies, Neural Networks (NNs) are increasingly being deployed in real-world Machine Learning systems.
To the best of our knowledge, this is the first proof-of-concept for employing approximations on the specialized CapsNet hardware.
In this paper, we perform a comprehensive error resilience analysis of DNNs subjected to hardware faults (e. g., permanent faults) in the weight memory.
A suitable approximate multiplier is then selected for each computing element from a library of approximate multipliers in such a way that (i) one approximate multiplier serves several layers, and (ii) the overall classification error and energy consumption are minimized.
The goal is to reduce the hardware requirements of CapsNets by removing unused/redundant connections and capsules, while keeping high accuracy through tests of different learning rate policies and batch sizes.
Because these libraries contain from tens to thousands of approximate implementations for a single arithmetic operation it is intractable to find an optimal combination of approximate circuits in the library even for an application consisting of a few operations.
We perform an in-depth evaluation for a Spiking Deep Belief Network (SDBN) and a DNN having the same number of layers and neurons (to obtain a fair comparison), in order to study the efficiency of our methodology and to understand the differences between SNNs and DNNs w. r. t.
Our experimental results show that the ROMANet saves DRAM access energy by 12% for the AlexNet, by 36% for the VGG-16, and by 46% for the MobileNet, while also improving the DRAM throughput by 10%, as compared to the state-of-the-art.
By leveraging this analysis, we propose a methodology to explore different on-chip memory designs and a power-gating technique to further reduce the energy consumption, depending upon the utilization across different operations of a CapsuleNet.
To address this limitation, decision-based attacks have been proposed which can estimate the model but they require several thousand queries to generate a single untargeted attack image.
Capsule Networks preserve the hierarchical spatial relationships between objects, and thereby bears a potential to surpass the performance of traditional Convolutional Neural Networks (CNNs) in performing tasks like image classification.
Therefore, computing paradigms are evolving towards machine learning (ML)-based systems because of their ability to efficiently and accurately process the enormous amount of data.
Adversarial examples have emerged as a significant threat to machine learning algorithms, especially to the convolutional neural networks (CNNs).
In this paper, we introduce a novel technique based on the Secure Selective Convolutional (SSC) techniques in the training loop that increases the robustness of a given DNN by allowing it to learn the data distribution based on the important edges in the input image.
We present a run-time methodology for HT detection that employs a multi-parameter statistical traffic modeling of the communication channel in a given System-on-Chip (SoC), named as SIMCom.
Deep neural networks (DNN)-based machine learning (ML) algorithms have recently emerged as the leading ML paradigm particularly for the task of classification due to their superior capability of learning efficiently from large datasets.
In this paper, we propose CapsAcc, the first specialized CMOS-based hardware architecture to perform CapsuleNets inference with high performance and energy efficiency.
Distributed, Parallel, and Cluster Computing Hardware Architecture
Most of the data manipulation attacks on deep neural networks (DNNs) during the training stage introduce a perceptible noise that can be catered by preprocessing during inference or can be identified during the validation phase.
The state-of-the-art accelerators for Convolutional Neural Networks (CNNs) typically focus on accelerating only the convolutional layers, but do not prioritize the fully-connected layers much.
Furthermore, this paper presents the main challenges in building a resilient IoT for CPS which is crucial in the era of smart CPS with enhanced connectivity (an excellent example of such a system is connected autonomous vehicles).