Search Results for author: Davide Rossi

Found 18 papers, 7 papers with code

A Heterogeneous RISC-V based SoC for Secure Nano-UAV Navigation

no code implementations7 Jan 2024 Luca Valente, Alessandro Nadalini, Asif Veeran, Mattia Sinigaglia, Bruno Sa, Nils Wistoff, Yvan Tortorella, Simone Benatti, Rafail Psiakis, Ari Kulmala, Baker Mohammad, Sandro Pinto, Daniele Palossi, Luca Benini, Davide Rossi

To the best of the authors' knowledge, it is the first silicon prototype of a ULP SoC coupling the RV64 and RV32 cores in a heterogeneous host+accelerator architecture fully based on the RISC-V ISA.

Marsellus: A Heterogeneous RISC-V AI-IoT End-Node SoC with 2-to-8b DNN Acceleration and 30%-Boost Adaptive Body Biasing

1 code implementation15 May 2023 Francesco Conti, Gianna Paulin, Angelo Garofalo, Davide Rossi, Alfio Di Mauro, Georg Rutishauser, Gianmarco Ottavi, Manuel Eggimann, Hayate Okuhara, Luca Benini

We present Marsellus, an all-digital heterogeneous SoC for AI-IoT end-nodes fabricated in GlobalFoundries 22nm FDX that combines 1) a general-purpose cluster of 16 RISC-V Digital Signal Processing (DSP) cores attuned for the execution of a diverse range of workloads exploiting 4-bit and 2-bit arithmetic extensions (XpulpNN), combined with fused MAC&LOAD operations and floating-point support; 2) a 2-8bit Reconfigurable Binary Engine (RBE) to accelerate 3x3 and 1x1 (pointwise) convolutions in DNNs; 3) a set of On-Chip Monitoring (OCM) blocks connected to an Adaptive Body Biasing (ABB) generator and a hardware control loop, enabling on-the-fly adaptation of transistor threshold voltages.

Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space

no code implementations15 Mar 2023 Michael Rogenmoser, Yvan Tortorella, Davide Rossi, Francesco Conti, Luca Benini

To mitigate the overheads of traditional radiation hardening and modular redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR) approach, a redundancy scheme that features a cluster of RISC-V processors with a flexible on-demand dual-core and triple-core lockstep grouping of computing cores with runtime split-lock capabilities.

RedMule: A Mixed-Precision Matrix-Matrix Operation Engine for Flexible and Energy-Efficient On-Chip Linear Algebra and TinyML Training Acceleration

1 code implementation10 Jan 2023 Yvan Tortorella, Luca Bertaccini, Luca Benini, Davide Rossi, Francesco Conti

The increasing interest in TinyML, i. e., near-sensor machine learning on power budgets of a few tens of mW, is currently pushing toward enabling TinyML-class training as opposed to inference only.

GVSoC: A Highly Configurable, Fast and Accurate Full-Platform Simulator for RISC-V based IoT Processors

1 code implementation20 Jan 2022 Nazareno Bruschi, Germain Haugou, Giuseppe Tagliavini, Francesco Conti, Luca Benini, Davide Rossi

The last few years have seen the emergence of IoT processors: ultra-low power systems-on-chips (SoCs) combining lightweight and flexible micro-controller units (MCUs), often based on open-ISA RISC-V cores, with application-specific accelerators to maximize performance and energy efficiency.

A Heterogeneous In-Memory Computing Cluster For Flexible End-to-End Inference of Real-World Deep Neural Networks

no code implementations4 Jan 2022 Angelo Garofalo, Gianmarco Ottavi, Francesco Conti, Geethan Karunaratne, Irem Boybat, Luca Benini, Davide Rossi

Furthermore, we explore the requirements for end-to-end inference of a full mobile-grade DNN (MobileNetV2) in terms of IMC array resources, by scaling up our heterogeneous architecture to a multi-array accelerator.

A Fully-Integrated 5mW, 0.8Gbps Energy-Efficient Chip-to-Chip Data Link for Ultra-Low-Power IoT End-Nodes in 65-nm CMOS

no code implementations5 Sep 2021 Hayate Okuhara, Ahmed Elnaqib, Martino Dazzi, Pierpaolo Palestri, Simone Benatti, Luca Benini, Davide Rossi

The increasing complexity of Internet-of-Things (IoT) applications and near-sensor processing algorithms is pushing the computational power of low-power, battery-operated end-node systems.

DORY: Automatic End-to-End Deployment of Real-World DNNs on Low-Cost IoT MCUs

1 code implementation17 Aug 2020 Alessio Burrello, Angelo Garofalo, Nazareno Bruschi, Giuseppe Tagliavini, Davide Rossi, Francesco Conti

In this work, we propose DORY (Deployment Oriented to memoRY) - an automatic tool to deploy DNNs on low cost MCUs with typically less than 1MB of on-chip SRAM memory.

C++ code Tiling & Deployment

Always-On 674uW @ 4GOP/s Error Resilient Binary Neural Networks with Aggressive SRAM Voltage Scaling on a 22nm IoT End-Node

no code implementations17 Jul 2020 Alfio Di Mauro, Francesco Conti, Pasquale Davide Schiavone, Davide Rossi, Luca Benini

On a prototype in 22nm FDX technology, we demonstrate that both the logic and SRAM voltage can be dropped to 0. 5Vwithout any accuracy penalty on a BNN trained for the CIFAR-10 dataset, improving energy efficiency by 2. 2X w. r. t.

PICO

Enabling Mixed-Precision Quantized Neural Networks in Extreme-Edge Devices

2 code implementations15 Jul 2020 Nazareno Bruschi, Angelo Garofalo, Francesco Conti, Giuseppe Tagliavini, Davide Rossi

The deployment of Quantized Neural Networks (QNN) on advanced microcontrollers requires optimized software to exploit digital signal processing (DSP) extensions of modern instruction set architectures (ISA).

Hardware Architecture Image and Video Processing

A 0.5GHz 0.35mW LDO-Powered Constant-Slope Phase Interpolator with 0.22$\%$ INL

no code implementations15 Jul 2020 Ahmed Elnaqib, Hayate Okuhara, Taekwang Jang, Davide Rossi, Luca Benini

Clock generators are an essential and critical building block of any communication link, whether it be wired or wireless, and they are increasingly critical given the push for lower I/O power and higher bandwidth in Systems-on-Chip (SoCs) for the Internet-of-Things (IoT).

PULP-NN: Accelerating Quantized Neural Networks on Parallel Ultra-Low-Power RISC-V Processors

1 code implementation29 Aug 2019 Angelo Garofalo, Manuele Rusci, Francesco Conti, Davide Rossi, Luca Benini

We present PULP-NN, an optimized computing library for a parallel ultra-low-power tightly coupled cluster of RISC-V processors.

Quantization

Neurostream: Scalable and Energy Efficient Deep Learning with Smart Memory Cubes

no code implementations23 Jan 2017 Erfan Azarkhish, Davide Rossi, Igor Loi, Luca Benini

Our codesign approach consists of a network of Smart Memory Cubes (modular extensions to the standard HMC) each augmented with a many-core PIM platform called NeuroCluster.

Hardware Architecture Emerging Technologies

An IoT Endpoint System-on-Chip for Secure and Energy-Efficient Near-Sensor Analytics

4 code implementations18 Dec 2016 Francesco Conti, Robert Schilling, Pasquale Davide Schiavone, Antonio Pullini, Davide Rossi, Frank Kagan Gürkaynak, Michael Muehlberghuber, Michael Gautschi, Igor Loi, Germain Haugou, Stefan Mangard, Luca Benini

Near-sensor data analytics is a promising direction for IoT endpoints, as it minimizes energy spent on communication and reduces network load - but it also poses security concerns, as valuable data is stored or sent over the network at various stages of the analytics pipeline.

EEG Electroencephalogram (EEG) +2

YodaNN: An Architecture for Ultra-Low Power Binary-Weight CNN Acceleration

no code implementations17 Jun 2016 Renzo Andri, Lukas Cavigelli, Davide Rossi, Luca Benini

Convolutional neural networks (CNNs) have revolutionized the world of computer vision over the last few years, pushing image classification beyond human accuracy.

General Classification Image Classification

Cannot find the paper you are looking for? You can Submit a new open access paper.