Search Results for author: Stylianos I. Venieris

Found 36 papers, 5 papers with code

Sparse-DySta: Sparsity-Aware Dynamic and Static Scheduling for Sparse Multi-DNN Workloads

1 code implementation17 Oct 2023 Hongxiang Fan, Stylianos I. Venieris, Alexandros Kouris, Nicholas D. Lane

Running multiple deep neural networks (DNNs) in parallel has become an emerging workload in both edge devices, such as mobile phones where multiple tasks serve a single user for daily activities, and data centers, where various requests are raised from millions of users, as seen with large language models.

Scheduling

Journey Towards Tiny Perceptual Super-Resolution

2 code implementations ECCV 2020 Royson Lee, Łukasz Dudziak, Mohamed Abdelfattah, Stylianos I. Venieris, Hyeji Kim, Hongkai Wen, Nicholas D. Lane

Recent works in single-image perceptual super-resolution (SR) have demonstrated unprecedented performance in generating realistic textures by means of deep convolutional networks.

Neural Architecture Search Super-Resolution

Meta-Learned Kernel For Blind Super-Resolution Kernel Estimation

1 code implementation15 Dec 2022 Royson Lee, Rui Li, Stylianos I. Venieris, Timothy Hospedales, Ferenc Huszár, Nicholas D. Lane

Recent image degradation estimation methods have enabled single-image super-resolution (SR) approaches to better upsample real-world images.

Blind Super-Resolution Image Super-Resolution

f-CNN$^{\text{x}}$: A Toolflow for Mapping Multi-CNN Applications on FPGAs

no code implementations25 May 2018 Stylianos I. Venieris, Christos-Savvas Bouganis

The predictive power of Convolutional Neural Networks (CNNs) has been an integral factor for emerging latency-sensitive applications, such as autonomous drones and vehicles.

Scheduling

CascadeCNN: Pushing the performance limits of quantisation

no code implementations22 May 2018 Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, to perform high-throughput inference by exploiting the computation time-accuracy trade-off.

Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions

no code implementations15 Mar 2018 Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

In the past decade, Convolutional Neural Networks (CNNs) have demonstrated state-of-the-art performance in various Artificial Intelligence tasks.

Approximate FPGA-based LSTMs under Computation Time Constraints

no code implementations7 Jan 2018 Michalis Rizakis, Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

Recurrent Neural Networks and in particular Long Short-Term Memory (LSTM) networks have demonstrated state-of-the-art accuracy in several emerging Artificial Intelligence tasks.

Autonomous Vehicles Image Captioning +1

fpgaConvNet: A Toolflow for Mapping Diverse Convolutional Neural Networks on Embedded FPGAs

no code implementations23 Nov 2017 Stylianos I. Venieris, Christos-Savvas Bouganis

By selectively optimising for throughput, latency or multiobjective criteria, the presented tool is able to efficiently explore the design space and generate hardware designs from high-level ConvNet specifications, explicitly optimised for the performance metric of interest.

Deploying Deep Neural Networks in the Embedded Space

no code implementations22 Jun 2018 Stylianos I. Venieris, Alexandros Kouris, Christos-Savvas Bouganis

Recently, Deep Neural Networks (DNNs) have emerged as the dominant model across various AI applications.

CascadeCNN: Pushing the Performance Limits of Quantisation in Convolutional Neural Networks

no code implementations13 Jul 2018 Alexandros Kouris, Stylianos I. Venieris, Christos-Savvas Bouganis

This work presents CascadeCNN, an automated toolflow that pushes the quantisation limits of any given CNN model, aiming to perform high-throughput inference.

Approximate LSTMs for Time-Constrained Inference: Enabling Fast Reaction in Self-Driving Cars

no code implementations2 May 2019 Alexandros Kouris, Stylianos I. Venieris, Michail Rizakis, Christos-Savvas Bouganis

The need to recognise long-term dependencies in sequential data such as video streams has made Long Short-Term Memory (LSTM) networks a prominent Artificial Intelligence model for many emerging applications.

Autonomous Navigation Self-Driving Cars

EmBench: Quantifying Performance Variations of Deep Neural Networks across Modern Commodity Devices

no code implementations17 May 2019 Mario Almeida, Stefanos Laskaridis, Ilias Leontiadis, Stylianos I. Venieris, Nicholas D. Lane

In recent years, advances in deep learning have resulted in unprecedented leaps in diverse tasks spanning from speech and object recognition to context awareness and health monitoring.

Object Recognition

MobiSR: Efficient On-Device Super-Resolution through Heterogeneous Mobile Processors

no code implementations21 Aug 2019 Royson Lee, Stylianos I. Venieris, Łukasz Dudziak, Sourav Bhattacharya, Nicholas D. Lane

In recent years, convolutional networks have demonstrated unprecedented performance in the image restoration task of super-resolution (SR).

Cloud Computing Image Restoration +2

Multi-Precision Policy Enforced Training (MuPPET): A precision-switching strategy for quantised fixed-point training of CNNs

no code implementations16 Jun 2020 Aditya Rajagopal, Diederik Adriaan Vink, Stylianos I. Venieris, Christos-Savvas Bouganis

Large-scale convolutional neural networks (CNNs) suffer from very long training times, spanning from hours to weeks, limiting the productivity and experimentation of deep learning practitioners.

HAPI: Hardware-Aware Progressive Inference

no code implementations10 Aug 2020 Stefanos Laskaridis, Stylianos I. Venieris, Hyeji Kim, Nicholas D. Lane

Convolutional neural networks (CNNs) have recently become the state-of-the-art in a diversity of AI tasks.

SPINN: Synergistic Progressive Inference of Neural Networks over Device and Cloud

no code implementations14 Aug 2020 Stefanos Laskaridis, Stylianos I. Venieris, Mario Almeida, Ilias Leontiadis, Nicholas D. Lane

Despite the soaring use of convolutional neural networks (CNNs) in mobile applications, uniformly sustaining high-performance inference on mobile has been elusive due to the excessive computational demands of modern CNNs and the increasing diversity of deployed devices.

Collaborative Inference

Neural Enhancement in Content Delivery Systems: The State-of-the-Art and Future Directions

no code implementations12 Oct 2020 Royson Lee, Stylianos I. Venieris, Nicholas D. Lane

In recent years, advances in the field of deep learning on tasks such as super-resolution and image enhancement have led to unprecedented performance in generating high-quality images from low-quality ones, a process we refer to as neural enhancement.

Image Enhancement Super-Resolution

It's always personal: Using Early Exits for Efficient On-Device CNN Personalisation

no code implementations2 Feb 2021 Ilias Leontiadis, Stefanos Laskaridis, Stylianos I. Venieris, Nicholas D. Lane

On-device machine learning is becoming a reality thanks to the availability of powerful hardware and model compression techniques.

Model Compression

unzipFPGA: Enhancing FPGA-based CNN Engines with On-the-Fly Weights Generation

no code implementations9 Mar 2021 Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane

Single computation engines have become a popular design choice for FPGA-based convolutional neural networks (CNNs) enabling the deployment of diverse models without fabric reconfiguration.

DynO: Dynamic Onloading of Deep Neural Networks from Cloud to Device

no code implementations20 Apr 2021 Mario Almeida, Stefanos Laskaridis, Stylianos I. Venieris, Ilias Leontiadis, Nicholas D. Lane

Recently, there has been an explosive growth of mobile and embedded applications using convolutional neural networks(CNNs).

Deep Neural Network-based Enhancement for Image and Video Streaming Systems: A Survey and Future Directions

no code implementations7 Jun 2021 Royson Lee, Stylianos I. Venieris, Nicholas D. Lane

In recent years, advances in the field of deep learning on tasks such as super-resolution and image enhancement have led to unprecedented performance in generating high-quality images from low-quality ones, a process we refer to as neural enhancement.

Image Enhancement Super-Resolution

Multi-Exit Semantic Segmentation Networks

no code implementations7 Jun 2021 Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis, Nicholas D. Lane

At the same time, the heterogeneous capabilities of the target platforms and the diverse constraints of different applications require the design and training of multiple target-specific segmentation models, leading to excessive maintenance costs.

Robot Navigation Segmentation +2

OODIn: An Optimised On-Device Inference Framework for Heterogeneous Mobile Devices

no code implementations8 Jun 2021 Stylianos I. Venieris, Ioannis Panopoulos, Iakovos S. Venieris

Radical progress in the field of deep learning (DL) has led to unprecedented accuracy in diverse inference tasks.

How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures

no code implementations21 Jun 2021 Stylianos I. Venieris, Ioannis Panopoulos, Ilias Leontiadis, Iakovos S. Venieris

Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.

speech-recognition Speech Recognition

Multi-DNN Accelerators for Next-Generation AI Systems

no code implementations19 May 2022 Stylianos I. Venieris, Christos-Savvas Bouganis, Nicholas D. Lane

As the use of AI-powered applications widens across multiple domains, so do increase the computational demands.

Adaptable Butterfly Accelerator for Attention-based NNs via Hardware and Algorithm Co-design

no code implementations20 Sep 2022 Hongxiang Fan, Thomas Chau, Stylianos I. Venieris, Royson Lee, Alexandros Kouris, Wayne Luk, Nicholas D. Lane, Mohamed S. Abdelfattah

By jointly optimizing the algorithm and hardware, our FPGA-based butterfly accelerator achieves 14. 2 to 23. 2 times speedup over state-of-the-art accelerators normalized to the same computational budget.

Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs

no code implementations27 Sep 2022 Alexandros Kouris, Stylianos I. Venieris, Stefanos Laskaridis, Nicholas D. Lane

With deep neural networks (DNNs) emerging as the backbone in a multitude of computer vision tasks, their adoption in real-world applications broadens continuously.

Autonomous Vehicles Scheduling

The Future of Consumer Edge-AI Computing

no code implementations19 Oct 2022 Stefanos Laskaridis, Stylianos I. Venieris, Alexandros Kouris, Rui Li, Nicholas D. Lane

In the last decade, Deep Learning has rapidly infiltrated the consumer end, mainly thanks to hardware acceleration across devices.

NAWQ-SR: A Hybrid-Precision NPU Engine for Efficient On-Device Super-Resolution

no code implementations15 Dec 2022 Stylianos I. Venieris, Mario Almeida, Royson Lee, Nicholas D. Lane

In recent years, image and video delivery systems have begun integrating deep learning super-resolution (SR) approaches, leveraging their unprecedented visual enhancement capabilities while reducing reliance on networking conditions.

Quantization Super-Resolution

Exploring the Performance and Efficiency of Transformer Models for NLP on Mobile Devices

no code implementations20 Jun 2023 Ioannis Panopoulos, Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris

Deep learning (DL) is characterised by its dynamic nature, with new deep neural network (DNN) architectures and approaches emerging every few years, driving the field's advancement.

MultiTASC: A Multi-Tenancy-Aware Scheduler for Cascaded DNN Inference at the Consumer Edge

no code implementations22 Jun 2023 Sokratis Nikolaidis, Stylianos I. Venieris, Iakovos S. Venieris

Cascade systems comprise a two-model sequence, with a lightweight model processing all samples and a heavier, higher-accuracy model conditionally refining harder samples to improve accuracy.

Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation

no code implementations25 Jul 2023 Stylianos I. Venieris, Javier Fernandez-Marques, Nicholas D. Lane

In this work, we investigate the implications in terms of CNN engine design for a class of models that introduce a pre-convolution stage to decompress the weights at run time.

LifeLearner: Hardware-Aware Meta Continual Learning System for Embedded Computing Platforms

no code implementations19 Nov 2023 Young D. Kwon, Jagmohan Chauhan, Hong Jia, Stylianos I. Venieris, Cecilia Mascolo

With respect to the state-of-the-art (SOTA) Meta CL method, LifeLearner drastically reduces the memory footprint (by 178. 7x), end-to-end latency by 80. 8-94. 2%, and energy consumption by 80. 9-94. 2%.

Continual Learning Meta-Learning

Cannot find the paper you are looking for? You can Submit a new open access paper.