The Winograd-enhanced DSA achieves up to 1. 85x gain in energy efficiency and up to 1. 83x end-to-end speed-up for state-of-the-art segmentation and detection networks.
The accuracy of a keyword spotting model deployed on embedded devices often degrades when the system is exposed to real environments with significant noise.
For quantifying the overall system power, including I/O power, we built Vau da Muntanialas, to the best of our knowledge, the first demonstration of a systolic multi-chip-on-PCB array of RNN accelerator.
Motor imagery brain--machine interfaces enable us to control machines by merely thinking of performing a motor action.
Logic optimization is an NP-hard problem commonly approached through hand-engineered heuristics.
With 9. 91 GMAC/s/W, it is 23. 0 times more energy-efficient and 46. 85 times faster than an implementation on the ARM Cortex M4F (0. 43 GMAC/s/W).
With Motor-Imagery (MI) Brain--Machine Interfaces (BMIs) we may control machines by merely thinking of performing a motor action.
This BNN reaches a 77. 9% accuracy, just 7% lower than the full-precision version, with 58 kB (7. 2 times less) for the weights and 262 kB (2. 4 times less) memory in total.
We present a 3. 1 POp/s/W fully digital hardware accelerator for ternary neural networks.
Experimental results on the BCI Competition IV-2a dataset show that EEG-TCNet achieves 77. 35% classification accuracy in 4-class MI.
Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67. 5% Top-1 accuracy with only 3. 0 mJ/frame -- at an accuracy drop of merely 1. 8% from the full-precision ResNet-18.
We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0. 4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr. Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-V ISA extensions and 8-core compute cluster.
This work presents InfiniWolf, a novel multi-sensor smartwatch that can achieve self-sustainability exploiting thermal and solar energy harvesting, performing computationally high demanding tasks.
We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values.
Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0. 5m/px.
The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network.
In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly.
We present a theoretical and experimental investigation of the quantization problem for artificial neural networks.
After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers.
The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets.
Accurate, fast, and reliable multiclass classification of electroencephalography (EEG) signals is a challenging task towards the development of motor imagery brain-computer interface (MI-BCI) systems.
Deploying state-of-the-art CNNs requires power-hungry processors and off-chip memory.
Design automation in general, and in particular logic synthesis, can play a key role in enabling the design of application-specific Binarized Neural Networks (BNN).
Recurrent neural networks (RNNs) are state-of-the-art in voice awareness/understanding and speech recognition.
Wireless distributed systems as used in sensor networks, Internet-of-Things and cyber-physical systems, impose high requirements on resource efficiency.
Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters.
We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy.
Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media.
The required communication links and archiving of the video data are still expensive and this setup excludes preemptive actions to respond to imminent threats.
We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms.
Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy.
An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution.