Search Results for author: Massoud Pedram

Found 44 papers, 7 papers with code

Scalable Superconductor Neuron with Ternary Synaptic Connections for Ultra-Fast SNN Hardware

no code implementations26 Feb 2024 Mustafa Altay Karamuftuoglu, Beyza Zeynep Ucpinar, Arash Fayyazi, Sasan Razmkhah, Mehdi Kamal, Massoud Pedram

A novel high-fan-in differential superconductor neuron structure designed for ultra-high-performance Spiking Neural Network (SNN) accelerators is presented.

4k Efficient Neural Network

Memory-Efficient Vision Transformers: An Activation-Aware Mixed-Rank Compression Strategy

no code implementations8 Feb 2024 Seyedarmin Azizi, Mahdi Nazemi, Massoud Pedram

This paper addresses this memory limitation by introducing an activation-aware model compression methodology that uses selective low-rank weight tensor approximations of different layers to reduce the parameter count of ViTs.

Model Compression

Low-Precision Mixed-Computation Models for Inference on Edge

no code implementations3 Dec 2023 Seyedarmin Azizi, Mahdi Nazemi, Mehdi Kamal, Massoud Pedram

This paper presents a mixed-computation neural network processing approach for edge applications that incorporates low-precision (low-width) Posit and low-precision fixed point (FixP) number systems.

Quantization

Sensitivity-Aware Mixed-Precision Quantization and Width Optimization of Deep Neural Networks Through Cluster-Based Tree-Structured Parzen Estimation

no code implementations12 Aug 2023 Seyedarmin Azizi, Mahdi Nazemi, Arash Fayyazi, Massoud Pedram

As a result, our proposed method represents a leap forward in neural network design optimization, paving the way for quick model design and implementation in settings with limited resources, thereby propelling the potential of scalable deep learning solutions.

Quantization

Brain Tumor Detection using Convolutional Neural Networks with Skip Connections

no code implementations14 Jul 2023 Aupam Hamran, Marzieh Vaeztourshizi, Amirhossein Esmaili, Massoud Pedram

Different CNN architecture optimization techniques such as widening and deepening of the network and adding skip connections are applied to improve the accuracy of the network.

A Fast Training-Free Compression Framework for Vision Transformers

1 code implementation4 Mar 2023 Jung Hwan Heo, Arash Fayyazi, Mahdi Nazemi, Massoud Pedram

Token pruning has emerged as an effective solution to speed up the inference of large Transformer models.

Efficient Compilation and Mapping of Fixed Function Combinational Logic onto Digital Signal Processors Targeting Neural Network Inference and Utilizing High-level Synthesis

no code implementations30 Jul 2022 Soheil Nazar Shahsavani, Arash Fayyazi, Mahdi Nazemi, Massoud Pedram

Recent efforts for improving the performance of neural network (NN) accelerators that meet today's application requirements have given rise to a new trend of logic-based NN inference relying on fixed function combinational logic.

Sparse Periodic Systolic Dataflow for Lowering Latency and Power Dissipation of Convolutional Neural Network Accelerators

no code implementations30 Jun 2022 Jung Hwan Heo, Arash Fayyazi, Amirhossein Esmaili, Massoud Pedram

This paper introduces the sparse periodic systolic (SPS) dataflow, which advances the state-of-the-art hardware accelerator for supporting lightweight neural networks.

A Fast and Efficient Conditional Learning for Tunable Trade-Off between Accuracy and Robustness

no code implementations28 Mar 2022 Souvik Kundu, Sairam Sundaresan, Massoud Pedram, Peter A. Beerel

In this paper, we present a fast learnable once-for-all adversarial training (FLOAT) algorithm, which instead of the existing FiLM-based conditioning, presents a unique weight conditioned learning that requires no additional layer, thereby incurring no significant increase in parameter count, training time, or network latency compared to standard adversarial training.

Image Classification

BMPQ: Bit-Gradient Sensitivity Driven Mixed-Precision Quantization of DNNs from Scratch

no code implementations24 Dec 2021 Souvik Kundu, Shikai Wang, Qirui Sun, Peter A. Beerel, Massoud Pedram

Compared to the baseline FP-32 models, BMPQ can yield models that have 15. 4x fewer parameter bits with a negligible drop in accuracy.

Quantization

Analyzing the Confidentiality of Undistillable Teachers in Knowledge Distillation

no code implementations NeurIPS 2021 Souvik Kundu, Qirui Sun, Yao Fu, Massoud Pedram, Peter Beerel

Knowledge distillation (KD) has recently been identified as a method that can unintentionally leak private information regarding the details of a teacher model to an unauthorized student.

Knowledge Distillation

HIRE-SNN: Harnessing the Inherent Robustness of Energy-Efficient Deep Spiking Neural Networks by Training with Crafted Input Noise

1 code implementation ICCV 2021 Souvik Kundu, Massoud Pedram, Peter A. Beerel

Low-latency deep spiking neural networks (SNNs) have become a promising alternative to conventional artificial neural networks (ANNs) because of their potential for increased energy efficiency on event-driven neuromorphic hardware.

Towards Low-Latency Energy-Efficient Deep SNNs via Attention-Guided Compression

no code implementations16 Jul 2021 Souvik Kundu, Gourav Datta, Massoud Pedram, Peter A. Beerel

To evaluate the merits of our approach, we performed experiments with variants of VGG and ResNet, on both CIFAR-10 and CIFAR-100, and VGG16 on Tiny-ImageNet. The SNN models generated through the proposed technique yield SOTA compression ratios of up to 33. 4x with no significant drops in accuracy compared to baseline unpruned counterparts.

Sparse Learning

NullaNet Tiny: Ultra-low-latency DNN Inference Through Fixed-function Combinational Logic

no code implementations7 Apr 2021 Mahdi Nazemi, Arash Fayyazi, Amirhossein Esmaili, Atharva Khare, Soheil Nazar Shahsavani, Massoud Pedram

While there is a large body of research on efficient processing of deep neural networks (DNNs), ultra-low-latency realization of these models for applications with stringent, sub-microsecond latency requirements continues to be an unresolved, challenging problem.

A2P-MANN: Adaptive Attention Inference Hops Pruned Memory-Augmented Neural Networks

no code implementations24 Jan 2021 Mohsen Ahmadzadeh, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram

In this work, to limit the number of required attention inference hops in memory-augmented neural networks, we propose an online adaptive approach called A2P-MANN.

Question Answering

BRDS: An FPGA-based LSTM Accelerator with Row-Balanced Dual-Ratio Sparsification

no code implementations7 Jan 2021 Seyed Abolfazl Ghasemzadeh, Erfan Bank Tavakoli, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram

In this paper, first, a hardware-friendly pruning algorithm for reducing energy consumption and improving the speed of Long Short-Term Memory (LSTM) neural network accelerators is presented.

Sentiment Analysis Sentiment Classification +2

A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs

1 code implementation3 Nov 2020 Souvik Kundu, Mahdi Nazemi, Peter A. Beerel, Massoud Pedram

This paper presents a dynamic network rewiring (DNR) method to generate pruned deep neural network (DNN) models that are robust against adversarial attacks yet maintain high accuracy on clean images.

Image Classification Model Compression

SynergicLearning: Neural Network-Based Feature Extraction for Highly-Accurate Hyperdimensional Learning

no code implementations30 Jul 2020 Mahdi Nazemi, Amirhossein Esmaili, Arash Fayyazi, Massoud Pedram

The proposed hybrid machine learning model has the same level of accuracy (i. e. $\pm$1%) as NNs while achieving at least 10% improvement in accuracy compared to HD learning models.

BIG-bench Machine Learning Computational Efficiency

Deep-PowerX: A Deep Learning-Based Framework for Low-Power Approximate Logic Synthesis

1 code implementation3 Jul 2020 Ghasem Pasandi, Mackenzie Peterson, Moises Herrera, Shahin Nazarian, Massoud Pedram

This paper aims at integrating three powerful techniques namely Deep Learning, Approximate Computing, and Low Power Design into a strategy to optimize logic at the synthesis level.

NN-PARS: A Parallelized Neural Network Based Circuit Simulation Framework

no code implementations13 Feb 2020 Mohammad Saeed Abrishami, Hao Ge, Justin F. Calderon, Massoud Pedram, Shahin Nazarian

The shrinking of transistor geometries as well as the increasing complexity of integrated circuits, significantly aggravate nonlinear design behavior.

Scheduling

CSM-NN: Current Source Model Based Logic Circuit Simulation -- A Neural Network Approach

no code implementations13 Feb 2020 Mohammad Saeed Abrishami, Massoud Pedram, Shahin Nazarian

The miniaturization of transistors down to 5nm and beyond, plus the increasing complexity of integrated circuits, significantly aggravate short channel effects, and demand analysis and optimization of more design corners and modes.

Efficient Training of Deep Convolutional Neural Networks by Augmentation in Embedding Space

no code implementations12 Feb 2020 Mohammad Saeed Abrishami, Amir Erfan Eshratifar, David Eigen, Yanzhi Wang, Shahin Nazarian, Massoud Pedram

However, fine-tuning a transfer model with data augmentation in the raw input space has a high computational cost to run the full network for every augmented input.

Data Augmentation Transfer Learning

Pre-defined Sparsity for Low-Complexity Convolutional Neural Networks

1 code implementation29 Jan 2020 Souvik Kundu, Mahdi Nazemi, Massoud Pedram, Keith M. Chugg, Peter A. Beerel

We also compared the performance of our proposed architectures with that of ShuffleNet andMobileNetV2.

Runtime Deep Model Multiplexing for Reduced Latency and Energy Consumption Inference

no code implementations14 Jan 2020 Amir Erfan Eshratifar, Massoud Pedram

The proposed algorithm allows the mobile device to detect the inputs that can be processed locally and the ones that require a larger model and should be sent a cloud server.

Energy-aware Scheduling of Jobs in Heterogeneous Cluster Systems Using Deep Reinforcement Learning

no code implementations11 Dec 2019 Amirhossein Esmaili, Massoud Pedram

Energy consumption is one of the most critical concerns in designing computing devices, ranging from portable embedded systems to computer cluster systems.

Management Reinforcement Learning (RL) +1

BottleNet: A Deep Learning Architecture for Intelligent Mobile Cloud Computing Services

no code implementations4 Feb 2019 Amir Erfan Eshratifar, Amirhossein Esmaili, Massoud Pedram

Recent studies have shown the latency and energy consumption of deep neural networks can be significantly improved by splitting the network between the mobile device and cloud.

Cloud Computing

Towards Collaborative Intelligence Friendly Architectures for Deep Learning

no code implementations1 Feb 2019 Amir Erfan Eshratifar, Amirhossein Esmaili, Massoud Pedram

In this approach, referred to as collaborative intelligence, intermediate features computed on the mobile device are offloaded to the cloud instead of the raw input data of the network, reducing the size of the data needed to be sent to the cloud.

Distributed, Parallel, and Cluster Computing

Approximate Logic Synthesis: A Reinforcement Learning-Based Technology Mapping Approach

no code implementations1 Feb 2019 Ghasem Pasandi, Shahin Nazarian, Massoud Pedram

Approximate Logic Synthesis (ALS) is the process of synthesizing and mapping a given Boolean network to a library of logic cells so that the magnitude/rate of error between outputs of the approximate and initial (exact) Boolean netlists is bounded from above by a predetermined total error threshold.

Hardware Architecture

Space Expansion of Feature Selection for Designing more Accurate Error Predictors

no code implementations30 Dec 2018 Shayan Tabatabaei Nikkhah, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram

The results on various benchmarks demonstrate significant improvements in the prediction accuracy compared to the prior works which used only the accelerator inputs for the prediction.

feature selection Scheduling

Modeling Processor Idle Times in MPSoC Platforms to Enable Integrated DPM, DVFS, and Task Scheduling Subject to a Hard Deadline

1 code implementation19 Dec 2018 Amirhossein Esmaili, Mahdi Nazemi, Massoud Pedram

Energy efficiency is one of the most critical design criteria for modern embedded systems such as multiprocessor system-on-chips (MPSoCs).

Operating Systems Distributed, Parallel, and Cluster Computing

Gradient Agreement as an Optimization Objective for Meta-Learning

no code implementations18 Oct 2018 Amir Erfan Eshratifar, David Eigen, Massoud Pedram

Therefore, the degree of the contribution of a task to the parameter updates is controlled by introducing a set of weights on the loss function of the tasks.

Meta-Learning

A Meta-Learning Approach for Custom Model Training

no code implementations21 Sep 2018 Amir Erfan Eshratifar, Mohammad Saeed Abrishami, David Eigen, Massoud Pedram

Transfer-learning and meta-learning are two effective methods to apply knowledge learned from large data sources to new tasks.

Meta-Learning Transfer Learning

NullaNet: Training Deep Neural Networks for Reduced-Memory-Access Inference

no code implementations23 Jul 2018 Mahdi Nazemi, Ghasem Pasandi, Massoud Pedram

Deep neural networks have been successfully deployed in a wide variety of applications including computer vision and speech recognition.

speech-recognition Speech Recognition

Deploying Customized Data Representation and Approximate Computing in Machine Learning Applications

no code implementations3 Jun 2018 Mahdi Nazemi, Massoud Pedram

Lop allows researchers and designers to quickly compare quality of their models using various data representations and arithmetic operations in Python and contrast the hardware cost of viable representations by synthesizing them on their target platforms (e. g., FPGA or ASIC).

BIG-bench Machine Learning

A Hardware-Friendly Algorithm for Scalable Training and Deployment of Dimensionality Reduction Models on FPGA

no code implementations11 Jan 2018 Mahdi Nazemi, Amir Erfan Eshratifar, Massoud Pedram

With ever-increasing application of machine learning models in various domains such as image classification, speech recognition and synthesis, and health care, designing efficient hardware for these models has gained a lot of popularity.

BIG-bench Machine Learning Dimensionality Reduction +4

FFT-Based Deep Learning Deployment in Embedded Systems

no code implementations13 Dec 2017 Sheng Lin, Ning Liu, Mahdi Nazemi, Hongjia Li, Caiwen Ding, Yanzhi Wang, Massoud Pedram

The large model size of DNNs, while providing excellent accuracy, also burdens the embedded platforms with intensive computation and storage.

speech-recognition Speech Recognition

High-Performance FPGA Implementation of Equivariant Adaptive Separation via Independence Algorithm for Independent Component Analysis

no code implementations6 Jul 2017 Mahdi Nazemi, Shahin Nazarian, Massoud Pedram

Independent Component Analysis (ICA) is a dimensionality reduction technique that can boost efficiency of machine learning models that deal with probability density functions, e. g. Bayesian neural networks.

BIG-bench Machine Learning Dimensionality Reduction

HEBS: Histogram Equalization for Backlight Scaling

1 code implementation25 Oct 2007 Ali Iranli, Hanif Fatemi, Massoud Pedram

In this paper, a method is proposed for finding a pixel transformation function that maximizes backlight dimming while maintaining a pre-specified image distortion level for a liquid crystal display.

Other Computer Science

Cannot find the paper you are looking for? You can Submit a new open access paper.