|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
Prior arts often discretize the network weights by carefully tuning hyper-parameters of quantization (e. g. non-uniform stepsize and layer-wise bitwidths), which are complicated and sub-optimal because the full-precision and low-precision models have a large discrepancy.
Modern Neural Networks are eminent in achieving state of the art performance on tasks under Computer Vision, Natural Language Processing and related verticals.
In this work, we analyze the effect of various compression techniques to UAP attacks, including different forms of pruning and quantization.
The experimental results running on an AMD server with four Geforce RTX 2080Ti GPUs show that our algorithm can achieve 3x speedup plus 19% energy savings on VGG distillation, and 3. 5x speedup plus 29% energy savings on ResNet distillation, both with negligible accuracy loss.
While knowledge distillation (transfer) has been attracting attentions from the research community, the recent development in the fields has heightened the need for reproducible studies and highly generalized frameworks to lower barriers to such high-quality, reproducible deep learning research.
Ranked #40 on Instance Segmentation on COCO test-dev
In this paper, we propose to modify the structure and training process of DNN models for complex image classification tasks to achieve in-network compression in the early network layers.
Bayesian optimization (BO) is a sample-efficient global optimization algorithm for black-box functions which are expensive to evaluate.
In this paper, the problem of pruning and compressingthe weights of various layers of deep neural networks is in-vestigated.
Second-order information, in the form of Hessian- or Inverse-Hessian-vector products, is a fundamental tool for solving optimization problems.
Linear layers still occupy a significant portion of the parameters in recurrent neural networks (RNNs).