no code implementations • 7 Aug 2023 • Jordan Dotzel, Gang Wu, Andrew Li, Muhammad Umar, Yun Ni, Mohamed S. Abdelfattah, Zhiru Zhang, Liqun Cheng, Martin G. Dixon, Norman P. Jouppi, Quoc V. Le, Sheng Li
With the proposed integer quantization search, we increase the accuracy of ResNet-18 on ImageNet by 1. 31% points and ResNet-50 by 0. 90% points with equivalent model cost over previous methods.
Specifically, the schedule works on a PyTorch model and uses a set of schedule primitives to convert the model for common model training optimizations such as high-performance kernels, effective 3D parallelism, and efficient activation checkpointing.
In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind.
To understand whether persistent memory is a good fit for GNNRecSys training, we perform an in-depth characterization of GNNRecSys workloads and a comprehensive analysis of their performance on a persistent memory device, namely, Intel Optane.
In addition, since PreCropping compresses CNNs at initialization, the computational and memory costs of CNNs are reduced for both training and inference on commodity hardware.
Using representation theory, we characterize which similarity matrices can be "expressed" by finite group VSA hypervectors, and we show how these VSAs can be constructed.
Graph neural networks (GNNs) have been increasingly deployed in various applications that involve learning on non-Euclidean data.
In order to enable joint optimization of the cost together with accuracy, we define arithmetic computation effort (ACE), a hardware- and energy-inspired cost metric for quantized and binarized networks.
Ranked #1 on Binarization on ImageNet
In this paper, we propose GARNET, a scalable spectral method to boost the adversarial robustness of GNN models for both homophilic and heterophilic graphs.
Our key insights are that 1) pointwise convolutions commute with frequency transformation and thus can be computed in the frequency domain without modification, 2) each channel within a given layer has a different level of sensitivity to frequency domain pruning, and 3) each channel's sensitivity to frequency pruning is approximately monotonic with respect to frequency.
Artificial intelligence (AI) technologies have dramatically advanced in recent years, resulting in revolutionary changes in people's lives.
A black-box spectral method is introduced for evaluating the adversarial robustness of a given machine learning (ML) model.
We design an efficient FPGA-based accelerator for our novel BNN model that supports the fractional activations.
1 code implementation • 4 Dec 2020 • Shubham Rai, Walter Lau Neto, Yukio Miyasaka, Xinpei Zhang, Mingfei Yu, Qingyang Yi Masahiro Fujita, Guilherme B. Manske, Matheus F. Pontes, Leomar S. da Rosa Junior, Marilton S. de Aguiar, Paulo F. Butzen, Po-Chun Chien, Yu-Shan Huang, Hoa-Ren Wang, Jie-Hong R. Jiang, Jiaqi Gu, Zheng Zhao, Zixuan Jiang, David Z. Pan, Brunno A. de Abreu, Isac de Souza Campos, Augusto Berndt, Cristina Meinhardt, Jonata T. Carvalho, Mateus Grellert, Sergio Bampi, Aditya Lohana, Akash Kumar, Wei Zeng, Azadeh Davoodi, Rasit O. Topaloglu, Yuan Zhou, Jordan Dotzel, Yichi Zhang, Hanyu Wang, Zhiru Zhang, Valerio Tenace, Pierre-Emmanuel Gaillardon, Alan Mishchenko, Satrajit Chatterjee
If the function is incompletely-specified, the implementation has to be true only on the care set.
This paper proposes GuardNN, a secure DNN accelerator that provides hardware-based protection for user data and model parameters even in an untrusted environment.
FeatGraph provides a flexible programming interface to express diverse GNN models by composing coarse-grained sparse templates with fine-grained user-defined functions (UDFs) on each vertex/edge.
The proposed approach is applicable to a variety of DNN architectures and significantly reduces the computational cost of DNN execution with almost no accuracy loss.
Specialized hardware for handling activation outliers can enable low-precision neural networks, but at the cost of nontrivial area overhead.
GraphZoom first performs graph fusion to generate a new graph that effectively encodes the topology of the original graph and the node attribute information.
Physical design process commonly consumes hours to days for large designs, and routing is known as the most critical step.
The majority of existing literature focuses on training quantized DNNs, while this work examines the less-studied topic of quantizing a floating-point model without (re)training.
UGConvs generalize two disparate ideas in CNN architecture, channel shuffling (i. e. ShuffleNet) and block-circulant networks (i. e. CirCNN), and provide unifying insights that lead to a deeper understanding of each technique.
Combining our method with knowledge distillation reduces the compute cost of ResNet-18 by 2. 6$\times$ without accuracy drop on ImageNet.
State-of-the-art convolutional neural networks are enormously costly in both compute and memory, demanding massively parallel GPUs for execution.