We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget on targeting device.
We propose global context vision transformer (GC ViT), a novel architecture that enhances parameter and compute utilization for computer vision tasks.
Ranked #98 on Semantic Segmentation on ADE20K
In this work we demonstrate the vulnerability of vision transformers (ViTs) to gradient-based inversion attacks.
Federated learning (FL) allows the collaborative training of AI models without needing to share raw data.
A-ViT achieves this by automatically reducing the number of tokens in vision transformers that are processed in the network as inference proceeds.
Through extensive experiments on ImageNet, we show that EPI empowers a quick tracking of early training epochs suitable for pruning, offering same efficacy as an otherwise ``oracle'' grid-search that scans through epochs and requires orders of magnitude more compute.
We propose Hardware-Aware Latency Pruning (HALP) that formulates structural pruning as a global resource allocation optimization problem, aiming at maximizing the accuracy while constraining latency under a predefined budget.
On ImageNet-1K, we prune the DEIT-Base (Touvron et al., 2021) model to a 2. 6x FLOPs reduction, 5. 1x parameter reduction, and 1. 9x run-time speedup with only 0. 07% loss in accuracy.
In the second phase, it solves the combinatorial selection of efficient operations using a novel constrained integer linear optimization approach.
Prior works usually assume that SC offers privacy benefits as only intermediate features, instead of private data, are shared from devices to the cloud.
We analyze three popular network architectures: EfficientNetV1, EfficientNetV2 and ResNeST, and achieve accuracy improvement for all models (up to $3. 0\%$) when compressing larger models to the latency level of smaller models.
We study the problem of quantizing N sorted, scalar datapoints with a fixed codebook containing K entries that are allowed to be rescaled.
In this work, we introduce GradInversion, using which input images from a larger batch (8 - 48 images) can also be recovered for large networks such as ResNets (50 layers), on complex datasets such as ImageNet (1000 classes, 224x224 px).
At the patient level, MHDeep DNNs achieve an accuracy of 100%, 100%, and 90. 0% for the three mental health disorders, respectively.
Modern deep neural networks are powerful and widely applicable models that extract task-relevant information through multi-level abstraction.
These large, deep models are often unsuitable for real-world applications, due to their massive computational cost, high memory bandwidth, and long latency.
We introduce DeepInversion, a new method for synthesizing images from the image distribution used to train a deep neural network.
For server (edge) side inference, we achieve a 96. 3% (95. 3%) accuracy in classifying diabetics against healthy individuals, and a 95. 7% (94. 6%) accuracy in distinguishing among type-1/type-2 diabetic, and healthy individuals.
Deep neural networks (DNNs) have become a widely deployed model for numerous machine learning applications.
In this work, we propose a hardware-guided symbiotic training methodology for compact, accurate, yet execution-efficient inference models.
We formulate platform-aware NN architecture search in an optimization framework and propose a novel algorithm to search for optimal architectures aided by efficient accuracy and resource (latency and/or energy) predictors.
To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM's original one level non-linear control gates.
To address these problems, we introduce a network growth algorithm that complements network pruning to learn both weights and compact DNN architectures during training.