Search Results for author: Luca Benini

Found 82 papers, 30 papers with code

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

no code implementations4 Jan 2022 Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator.

Sub-100uW Multispectral Riemannian Classification for EEG-based Brain--Machine Interfaces

no code implementations18 Dec 2021 Xiaying Wang, Lukas Cavigelli, Tibor Schneider, Luca Benini

Motor imagery brain--machine interfaces enable us to control machines by merely thinking of performing a motor action.


A TinyML Platform for On-Device Continual Learning with Quantized Latent Replays

no code implementations20 Oct 2021 Leonardo Ravaglia, Manuele Rusci, Davide Nadalini, Alessandro Capotondi, Francesco Conti, Luca Benini

In this work, we introduce a HW/SW platform for end-to-end CL based on a 10-core FP32-enabled parallel ultra-low-power (PULP) processor.

Continual Learning Quantization

A Fully-Integrated 5mW, 0.8Gbps Energy-Efficient Chip-to-Chip Data Link for Ultra-Low-Power IoT End-Nodes in 65-nm CMOS

no code implementations5 Sep 2021 Hayate Okuhara, Ahmed Elnaqib, Martino Dazzi, Pierpaolo Palestri, Simone Benatti, Luca Benini, Davide Rossi

The increasing complexity of Internet-of-Things (IoT) applications and near-sensor processing algorithms is pushing the computational power of low-power, battery-operated end-node systems.

Memory-Aware Partitioning of Machine Learning Applications for Optimal Energy Use in Batteryless Systems

no code implementations5 Aug 2021 Andres Gomez, Andreas Tretter, Pascal Alexander Hager, Praveenth Sanmugarajah, Luca Benini, Lothar Thiele

By leveraging interkernel data dependencies, these energy-bounded execution cycles minimize the number of system activations and nonvolatile data transfers, and thus the total energy overhead.

NN2CAM: Automated Neural Network Mapping for Multi-Precision Edge Processing on FPGA-Based Cameras

no code implementations24 Jun 2021 Petar Jokic, Stephane Emery, Luca Benini

The record-breaking achievements of deep neural networks (DNNs) in image classification and detection tasks resulted in a surge of new computer vision applications during the past years.

Image Classification

A Construction Kit for Efficient Low Power Neural Network Accelerator Designs

no code implementations24 Jun 2021 Petar Jokic, Erfan Azarkhish, Andrea Bonetti, Marc Pons, Stephane Emery, Luca Benini

This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance.

Towards Long-term Non-invasive Monitoring for Epilepsy via Wearable EEG Devices

no code implementations15 Jun 2021 Thorir Mar Ingolfsson, Andrea Cossettini, Xiaying Wang, Enrico Tabanelli, Giuseppe Tagliavini, Philippe Ryvlin, Luca Benini, Simone Benatti

We present the implementation of seizure detection algorithms based on a minimal number of EEG channels on a parallel ultra-low-power embedded platform.

EEG Seizure Detection

Trimming Feature Extraction and Inference for MCU-based Edge NILM: a Systematic Approach

no code implementations21 May 2021 Enrico Tabanelli, Davide Brunelli, Andrea Acquaviva, Luca Benini

State-of-the-Art approaches are based on Machine Learning methods and exploit the fusion of time- and frequency-domain features from current and voltage sensors.

Non-Intrusive Load Monitoring

Implementing CNN Layers on the Manticore Cluster-Based Many-Core Architecture

no code implementations16 Apr 2021 Andreas Kurth, Fabian Schuiki, Luca Benini

This document presents implementations of fundamental convolutional neural network (CNN) layers on the Manticore cluster-based many-core architecture and discusses their characteristics and trade-offs.

ECG-TCN: Wearable Cardiac Arrhythmia Detection with a Temporal Convolutional Network

1 code implementation25 Mar 2021 Thorir Mar Ingolfsson, Xiaying Wang, Michael Hersche, Alessio Burrello, Lukas Cavigelli, Luca Benini

With 9. 91 GMAC/s/W, it is 23. 0 times more energy-efficient and 46. 85 times faster than an implementation on the ARM Cortex M4F (0. 43 GMAC/s/W).

Arrhythmia Detection

Enabling Design Methodologies and Future Trends for Edge AI: Specialization and Co-design

no code implementations25 Mar 2021 Cong Hao, Jordan Dotzel, JinJun Xiong, Luca Benini, Zhiru Zhang, Deming Chen

Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.


A 5 μW Standard Cell Memory-based Configurable Hyperdimensional Computing Accelerator for Always-on Smart Sensing

no code implementations4 Feb 2021 Manuel Eggimann, Abbas Rahimi, Luca Benini

Hyperdimensional computing (HDC) is a brain-inspired computing paradigm based on high-dimensional holistic representations of vectors.

Fault Detection Gesture Recognition

Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices

no code implementations12 Jan 2021 Gianmarco Cerutti, Renzo Andri, Lukas Cavigelli, Michele Magno, Elisabetta Farella, Luca Benini

This BNN reaches a 77. 9% accuracy, just 7% lower than the full-precision version, with 58 kB (7. 2 times less) for the weights and 262 kB (2. 4 times less) memory in total.

Event Detection Object Recognition +2

CUTIE: Beyond PetaOp/s/W Ternary DNN Inference Acceleration with Better-than-Binary Energy Efficiency

no code implementations3 Nov 2020 Moritz Scherer, Georg Rutishauser, Lukas Cavigelli, Luca Benini

We present a 3. 1 POp/s/W fully digital hardware accelerator for ternary neural networks.

Hardware Architecture

Binarization Methods for Motor-Imagery Brain-Computer Interface Classification

no code implementations14 Oct 2020 Michael Hersche, Luca Benini, Abbas Rahimi

Our first method, based on sparse bipolar random projection, projects a large number of real-valued Riemannian covariance features to a binary space, where a linear SVM classifier can be learned with binary weights too.

Binarization General Classification

Robust High-dimensional Memory-augmented Neural Networks

no code implementations5 Oct 2020 Geethan Karunaratne, Manuel Schmuck, Manuel Le Gallo, Giovanni Cherubini, Luca Benini, Abu Sebastian, Abbas Rahimi

Traditional neural networks require enormous amounts of data to build their complex mappings during a slow training procedure that hinders their abilities for relearning and adapting to new data.

Few-Shot Image Classification

Leveraging Automated Mixed-Low-Precision Quantization for tiny edge microcontrollers

no code implementations12 Aug 2020 Manuele Rusci, Marco Fariselli, Alessandro Capotondi, Luca Benini

The severe on-chip memory limitations are currently preventing the deployment of the most accurate Deep Neural Network (DNN) models on tiny MicroController Units (MCUs), even if leveraging an effective 8-bit quantization scheme.


Improving Memory Utilization in Convolutional Neural Network Accelerators

no code implementations20 Jul 2020 Petar Jokic, Stephane Emery, Luca Benini

While the accuracy of convolutional neural networks has achieved vast improvements by introducing larger and deeper network architectures, also the memory footprint for storing their parameters and activations has increased.

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

no code implementations17 Jul 2020 Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini

On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0. 5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2. 2X w. r. t.


A 0.5GHz 0.35mW LDO-Powered Constant-Slope Phase Interpolator with 0.22$\%$ INL

no code implementations15 Jul 2020 Ahmed Elnaqib, Hayate Okuhara, Taekwang Jang, Davide Rossi, Luca Benini

Clock generators are an essential and critical building block of any communication link, whether it be wired or wireless, and they are increasingly critical given the push for lower I/O power and higher bandwidth in Systems-on-Chip (SoCs) for the Internet-of-Things (IoT).

TinyRadarNN: Combining Spatial and Temporal Convolutional Neural Networks for Embedded Gesture Recognition with Short Range Radars

1 code implementation25 Jun 2020 Moritz Scherer, Michele Magno, Jonas Erb, Philipp Mayer, Manuel Eggimann, Luca Benini

Furthermore, the gesture recognition classifier has been implemented on a Parallel Ultra-Low Power Processor, demonstrating that real-time prediction is feasible with only 21 mW of power consumption for the full TCN sequence prediction network, while a system-level power consumption of less than 100 mW is achieved.

Hand Gesture Recognition Hand-Gesture Recognition

Automated Design Space Exploration for optimised Deployment of DNN on Arm Cortex-A CPUs

no code implementations9 Jun 2020 Miguel de Prado, Andrew Mundy, Rabia Saeed, Maurizio Denna, Nuria Pazos, Luca Benini

The framework relies on a Reinforcement Learning search that, combined with a deep learning inference framework, automatically explores the design space and learns an optimised solution that speeds up the performance and reduces the memory on embedded CPU platforms.

ChewBaccaNN: A Flexible 223 TOPS/W BNN Accelerator

no code implementations12 May 2020 Renzo Andri, Geethan Karunaratne, Lukas Cavigelli, Luca Benini

Furthermore, it can perform inference on a binarized ResNet-18 trained with 8-bases Group-Net to achieve a 67. 5% Top-1 accuracy with only 3. 0 mJ/frame -- at an accuracy drop of merely 1. 8% from the full-precision ResNet-18.

Optimizing Temporal Convolutional Network inference on FPGA-based accelerators

no code implementations7 May 2020 Marco Carreras, Gianfranco Deriu, Luigi Raffo, Luca Benini, Paolo Meloni

Convolutional Neural Networks are extensively used in a wide range of applications, commonly including computer vision tasks like image and video classification, recognition, and segmentation.

Time Series Video Classification

Q-EEGNet: an Energy-Efficient 8-bit Quantized Parallel EEGNet Implementation for Edge Motor-Imagery Brain--Machine Interfaces

1 code implementation24 Apr 2020 Tibor Schneider, Xiaying Wang, Michael Hersche, Lukas Cavigelli, Luca Benini

We quantize weights and activations to 8-bit fixed-point with a negligible accuracy loss of 0. 4% on 4-class MI, and present an energy-efficient hardware-aware implementation on the Mr. Wolf parallel ultra-low power (PULP) System-on-Chip (SoC) by utilizing its custom RISC-V ISA extensions and 8-core compute cluster.


pAElla: Edge-AI based Real-Time Malware Detection in Data Centers

1 code implementation7 Apr 2020 Antonio Libri, Andrea Bartolini, Luca Benini

The method -- called pAElla -- targets real-time Malware Detection (MD), it runs on an out-of-band IoT-based monitoring system for DCs/SCs, and involves Power Spectral Density of power measurements, along with AutoEncoders.

Anomaly Detection Edge-computing +1

LLHD: A Multi-level Intermediate Representation for Hardware Description Languages

1 code implementation7 Apr 2020 Fabian Schuiki, Andreas Kurth, Tobias Grosser, Luca Benini

These tools are monolithic and mostly proprietary, disagree in their implementation of HDLs, and while many redundant IRs exists, no IR today can be used through the entire circuit design flow.

Programming Languages

An Accurate EEGNet-based Motor-Imagery Brain-Computer Interface for Low-Power Edge Computing

no code implementations31 Mar 2020 Xiaying Wang, Michael Hersche, Batuhan Tömekce, Burak Kaya, Michele Magno, Luca Benini

Our novel method further scales down the standard EEGNet at a negligible accuracy loss of 0. 31% with 7. 6x memory footprint reduction and a small accuracy loss of 2. 51% with 15x reduction.

Edge-computing EEG +1

Extending the RISC-V ISA for Efficient RNN-based 5G Radio Resource Management

no code implementations27 Feb 2020 Renzo Andri, Tomas Henriksson, Luca Benini

Radio Resource Management (RRM) in 5G mobile communication is a challenging problem for which Recurrent Neural Networks (RNN) have shown promising results.

Combining Learning and Optimization for Transprecision Computing

2 code implementations24 Feb 2020 Andrea Borghesi, Giuseppe Tagliavini, Michele Lombardi, Luca Benini, Michela Milano

The ML model learns the relation between variables precision and the output error; this information is then embedded in the MP focused on minimizing the number of bits.

Distributed, Parallel, and Cluster Computing

RPR: Random Partition Relaxation for Training; Binary and Ternary Weight Neural Networks

no code implementations4 Jan 2020 Lukas Cavigelli, Luca Benini

We present Random Partition Relaxation (RPR), a method for strong quantization of neural networks weight to binary (+1/-1) and ternary (+1/0/-1) values.


HR-SAR-Net: A Deep Neural Network for Urban Scene Segmentation from High-Resolution SAR Data

no code implementations10 Dec 2019 Xiaying Wang, Lukas Cavigelli, Manuel Eggimann, Michele Magno, Luca Benini

Synthetic aperture radar (SAR) data is becoming increasingly available to a wide range of users through commercial service providers with resolutions reaching 0. 5m/px.

Scene Segmentation

Constrained deep neural network architecture search for IoT devices accounting for hardware calibration

no code implementations NeurIPS 2019 Florian Scheidegger, Luca Benini, Costas Bekas, A. Cristiano I. Malossi

The narrow-space search of floating-point models improves the accuracy on CIFAR10 of an established IoT model from 70. 64% to 74. 87% within the same memory constraints.

General Classification Image Classification

FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things

1 code implementation8 Nov 2019 Xiaying Wang, Michele Magno, Lukas Cavigelli, Luca Benini

The growing number of low-power smart devices in the Internet of Things is coupled with the concept of "Edge Computing", that is moving some of the intelligence, especially machine learning, towards the edge of the network.


Constrained deep neural network architecture search for IoT devices accounting hardware calibration

no code implementations24 Sep 2019 Florian Scheidegger, Luca Benini, Costas Bekas, Cristiano Malossi

We further improve the accuracy to 82. 07% by including 16-bit half types and we obtain the best accuracy of 83. 45% by extending the search with model optimized IEEE 754 reduced types.

General Classification Image Classification

EBPC: Extended Bit-Plane Compression for Deep Neural Network Inference and Training Accelerators

2 code implementations30 Aug 2019 Lukas Cavigelli, Georg Rutishauser, Luca Benini

In the wake of the success of convolutional neural networks in image classification, object recognition, speech recognition, etc., the demand for deploying these compute-intensive ML models on embedded and mobile systems with tight power and energy constraints at low cost, as well as for boosting throughput in data centers, is growing rapidly.

Image Classification Object Recognition +1

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

1 code implementation29 Aug 2019 Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors.


5 Parallel Prism: A topology for pipelined implementations of convolutional neural networks using computational memory

no code implementations8 Jun 2019 Martino Dazzi, Abu Sebastian, Pier Andrea Francese, Thomas Parnell, Luca Benini, Evangelos Eleftheriou

We show that this communication fabric facilitates the pipelined execution of all state of-the-art CNNs by proving the existence of a homomorphism between one graph representation of these networks and the proposed graph topology.

In-memory hyperdimensional computing

no code implementations4 Jun 2019 Geethan Karunaratne, Manuel Le Gallo, Giovanni Cherubini, Luca Benini, Abbas Rahimi, Abu Sebastian

Hyperdimensional computing (HDC) is an emerging computational framework that takes inspiration from attributes of neuronal circuits such as hyperdimensionality, fully distributed holographic representation, and (pseudo)randomness.

General Classification Hand Gesture Recognition +2

Ara: A 1 GHz+ Scalable and Energy-Efficient RISC-V Vector Processor with Multi-Precision Floating Point Support in 22 nm FD-SOI

no code implementations2 Jun 2019 Matheus Cavalcante, Fabian Schuiki, Florian Zaruba, Michael Schaffner, Luca Benini

In this paper, we present Ara, a 64-bit vector processor based on the version 0. 5 draft of RISC-V's vector extension, implemented in GlobalFoundries 22FDX FD-SOI technology.

Hardware Architecture

Memory-Driven Mixed Low Precision Quantization For Enabling Deep Network Inference On Microcontrollers

2 code implementations30 May 2019 Manuele Rusci, Alessandro Capotondi, Luca Benini

To fit the memory and computational limitations of resource-constrained edge-devices, we exploit mixed low-bitwidth compression, featuring 8, 4 or 2-bit uniform quantization, and we model the inference graph with integer-only operations.


An Open Source and Open Hardware Deep Learning-powered Visual Navigation Engine for Autonomous Nano-UAVs

2 code implementations10 May 2019 Daniele Palossi, Francesco Conti, Luca Benini

Nano-size unmanned aerial vehicles (UAVs), with few centimeters of diameter and sub-10 Watts of total power budget, have so far been considered incapable of running sophisticated visual-based autonomous navigation software without external aid from base-stations, ad-hoc local positioning infrastructure, and powerful external computation servers.

Autonomous Navigation Visual Navigation

Online Anomaly Detection in HPC Systems

1 code implementation22 Feb 2019 Andrea Borghesi, Antonio Libri, Luca Benini, Andrea Bartolini

Reliability is a cumbersome problem in High Performance Computing Systems and Data Centers evolution.

Distributed, Parallel, and Cluster Computing

Optimally Scheduling CNN Convolutions for Efficient Memory Access

no code implementations4 Feb 2019 Arthur Stoutchinin, Francesco Conti, Luca Benini

Embedded inference engines for convolutional networks must be parsimonious in memory bandwidth and buffer sizing to meet power and cost constraints.

Learning to infer: RL-based search for DNN primitive selection on Heterogeneous Embedded Systems

no code implementations18 Nov 2018 Miguel de Prado, Nuria Pazos, Luca Benini

In this work, we present QS-DNN, a fully automatic search based on Reinforcement Learning which, combined with an inference engine optimizer, efficiently explores through the design space and empirically finds the optimal combinations of libraries and primitives to speed up the inference of CNNs on heterogeneous embedded devices.

QUENN: QUantization Engine for low-power Neural Networks

no code implementations14 Nov 2018 Miguel de Prado, Maurizio Denna, Luca Benini, Nuria Pazos

Deep Learning is moving to edge devices, ushering in a new age of distributed Artificial Intelligence (AI).


Anomaly Detection using Autoencoders in High Performance Computing Systems

5 code implementations13 Nov 2018 Andrea Borghesi, Andrea Bartolini, Michele Lombardi, Michela Milano, Luca Benini

Anomaly detection in supercomputers is a very difficult problem due to the big scale of the systems and the high number of components.

Anomaly Detection

Robust identification of thermal models for in-production High-Performance-Computing clusters with machine learning-based data selection

no code implementations3 Oct 2018 Federico Pittino, Roberto Diversi, Luca Benini, Andrea Bartolini

However, we also show that: 1) not all real workloads allow for the identification of a good model; 2) starting from the theory of system identification it is very difficult to evaluate if a trace of data leads to a good estimated model.


Extended Bit-Plane Compression for Convolutional Neural Network Accelerators

1 code implementation1 Oct 2018 Lukas Cavigelli, Luca Benini

After the tremendous success of convolutional neural networks in image classification, object detection, speech recognition, etc., there is now rising demand for deployment of these compute-intensive ML models on tightly power constrained embedded and mobile systems at low cost as well as for pushing the throughput in data centers.

Image Classification Object Detection +1

One-shot Learning for iEEG Seizure Detection Using End-to-end Binary Operations: Local Binary Patterns with Hyperdimensional Computing

no code implementations6 Sep 2018 Alessio Burrello, Kaspar Schindler, Luca Benini, Abbas Rahimi

This paper presents an efficient binarized algorithm for both learning and classification of human epileptic seizures from intracranial electroencephalography (iEEG).

One-Shot Learning Seizure Detection +1

CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams

2 code implementations15 Aug 2018 Lukas Cavigelli, Luca Benini

The last few years have brought advances in computer vision at an amazing pace, grounded on new findings in deep neural network construction and training as well as the availability of large labeled datasets.

Object Detection Semantic Segmentation

Hardware Optimizations of Dense Binary Hyperdimensional Computing: Rematerialization of Hypervectors, Binarized Bundling, and Combinational Associative Memory

1 code implementation20 Jul 2018 Manuel Schmuck, Luca Benini, Abbas Rahimi

In this paper, we propose hardware techniques for optimizations of HD computing, in a synthesizable VHDL library, to enable co-located implementation of both learning and classification tasks on only a small portion of Xilinx(R) UltraScale(TM) FPGAs: (1) We propose simple logical operations to rematerialize the hypervectors on the fly rather than loading them from memory.

XNOR Neural Engine: a Hardware Accelerator IP for 21.6 fJ/op Binary Neural Network Inference

1 code implementation9 Jul 2018 Francesco Conti, Pasquale Davide Schiavone, Luca Benini

Binary Neural Networks (BNNs) are promising to deliver accuracy comparable to conventional deep neural networks at a fraction of the cost in terms of memory and energy.

COUNTDOWN - three, two, one, low power! A Run-time Library for Energy Saving in MPI Communication Primitives

1 code implementation19 Jun 2018 Daniele Cesarini, Andrea Bartolini, Pietro Bonfà, Carlo Cavazzoni, Luca Benini

Power consumption is a looming treat in today's computing progress.

Distributed, Parallel, and Cluster Computing

Fast and Accurate Multiclass Inference for MI-BCIs Using Large Multiscale Temporal and Spectral Features

2 code implementations18 Jun 2018 Michael Hersche, Tino Rellstab, Pasquale Davide Schiavone, Lukas Cavigelli, Luca Benini, Abbas Rahimi

Accurate, fast, and reliable multiclass classification of electroencephalography (EEG) signals is a challenging task towards the development of motor imagery brain-computer interface (MI-BCI) systems.


A 64mW DNN-based Visual Navigation Engine for Autonomous Nano-Drones

3 code implementations4 May 2018 Daniele Palossi, Antonio Loquercio, Francesco Conti, Eric Flamand, Davide Scaramuzza, Luca Benini

As part of our general methodology we discuss the software mapping techniques that enable the state-of-the-art deep convolutional neural network presented in [1] to be fully executed on-board within a strict 6 fps real-time constraint with no compromise in terms of flight results, while all processing is done with only 64 mW on average.

Autonomous Navigation Visual Navigation

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

no code implementations19 Feb 2018 Fabian Schuiki, Michael Schaffner, Frank K. Gürkaynak, Luca Benini

Most investigations into near-memory hardware accelerators for deep neural networks have primarily focused on inference, while the potential of accelerating training has received relatively little attention so far.

Distributed, Parallel, and Cluster Computing Hardware Architecture

HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA

2 code implementations18 Dec 2017 Andreas Kurth, Pirmin Vogel, Alessandro Capotondi, Andrea Marongiu, Luca Benini

Heterogeneous embedded systems on chip (HESoCs) co-integrate a standard host processor with programmable manycore accelerators (PMCAs) to combine general-purpose computing with domain-specific, efficient processing capabilities.

Hardware Architecture Distributed, Parallel, and Cluster Computing

NEURAghe: Exploiting CPU-FPGA Synergies for Efficient and Flexible CNN Inference Acceleration on Zynq SoCs

no code implementations4 Dec 2017 Paolo Meloni, Alessandro Capotondi, Gianfranco Deriu, Michele Brian, Francesco Conti, Davide Rossi, Luigi Raffo, Luca Benini

Deep convolutional neural networks (CNNs) obtain outstanding results in tasks that require human-level understanding of data, like image or speech recognition.

Speech Recognition

Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity?

no code implementations21 Nov 2017 Manuele Rusci, Lukas Cavigelli, Luca Benini

Design automation in general, and in particular logic synthesis, can play a key role in enabling the design of application-specific Binarized Neural Networks (BNN).

CBinfer: Change-Based Inference for Convolutional Neural Networks on Video Data

1 code implementation14 Apr 2017 Lukas Cavigelli, Philippe Degen, Luca Benini

Extracting per-frame features using convolutional neural networks for real-time processing of video data is currently mainly performed on powerful GPU-accelerated workstations and compute clusters.

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

no code implementations23 Jan 2017 Erfan Azarkhish, Davide Rossi, Igor Loi, Luca Benini

Our codesign approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster.

Hardware Architecture Emerging Technologies

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

4 code implementations18 Dec 2016 Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline.

EEG Face Detection +1

CAS-CNN: A Deep Convolutional Neural Network for Image Compression Artifact Suppression

1 code implementation22 Nov 2016 Lukas Cavigelli, Pascal Hager, Luca Benini

Lossy image compression algorithms are pervasively used to reduce the size of images transmitted over the web and recorded on data storage media.

Image Compression

Computationally Efficient Target Classification in Multispectral Image Data with Deep Neural Networks

no code implementations9 Nov 2016 Lukas Cavigelli, Dominic Bernath, Michele Magno, Luca Benini

The required communication links and archiving of the video data are still expensive and this setup excludes preemptive actions to respond to imminent threats.

General Classification Scene Labeling

Deep Structured Features for Semantic Segmentation

no code implementations26 Sep 2016 Michael Tschannen, Lukas Cavigelli, Fabian Mentzer, Thomas Wiatowski, Luca Benini

We propose a highly structured neural network architecture for semantic segmentation with an extremely small model size, suitable for low-power embedded and mobile platforms.

General Classification Semantic Segmentation

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

no code implementations17 Jun 2016 Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy.

General Classification Image Classification

Origami: A 803 GOp/s/W Convolutional Network Accelerator

no code implementations14 Dec 2015 Lukas Cavigelli, Luca Benini

An ever increasing number of computer vision and image/video processing challenges are being approached using deep convolutional neural networks, obtaining state-of-the-art results in object recognition and detection, semantic segmentation, action recognition, optical flow and superresolution.

Action Recognition Object Recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.