Search Results for author: Michael Mahoney

Found 14 papers, 7 papers with code

Error Estimation for Sketched SVD

no code implementations ICML 2020 Miles Lopes, N. Benjamin Erichson, Michael Mahoney

In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach.

What’s Hidden in a One-layer Randomly Weighted Transformer?

1 code implementation EMNLP 2021 Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael Mahoney

Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.

Machine Translation Translation

GACT: Activation Compressed Training for Generic Network Architectures

1 code implementation22 Jun 2022 Xiaoxuan Liu, Lianmin Zheng, Dequan Wang, Yukuo Cen, Weize Chen, Xu Han, Jianfei Chen, Zhiyuan Liu, Jie Tang, Joey Gonzalez, Michael Mahoney, Alvin Cheung

Training large neural network (NN) models requires extensive memory resources, and Activation Compressed Training (ACT) is a promising approach to reduce training memory footprint.

LocalNewton: Reducing Communication Bottleneck for Distributed Learning

no code implementations16 May 2021 Vipul Gupta, Avishek Ghosh, Michal Derezinski, Rajiv Khanna, Kannan Ramchandran, Michael Mahoney

To enhance practicability, we devise an adaptive scheme to choose L, and we show that this reduces the number of local iterations in worker machines between two model synchronizations as the training proceeds, successively refining the model quality at the master.

Distributed Optimization

PyHessian: Neural Networks Through the Lens of the Hessian

2 code implementations16 Dec 2019 Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

To illustrate this, we analyze the effect of residual connections and Batch Normalization layers on the trainability of neural networks.

ANODEV2: A Coupled Neural ODE Evolution Framework

no code implementations10 Jun 2019 Tianjun Zhang, Zhewei Yao, Amir Gholami, Kurt Keutzer, Joseph Gonzalez, George Biros, Michael Mahoney

It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE).

HAWQ: Hessian AWare Quantization of Neural Networks with Mixed-Precision

1 code implementation ICCV 2019 Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, Kurt Keutzer

Another challenge is a similar factorial complexity for determining block-wise fine-tuning order when quantizing the model to a target precision.


Trust Region Based Adversarial Attack on Neural Networks

2 code implementations CVPR 2019 Zhewei Yao, Amir Gholami, Peng Xu, Kurt Keutzer, Michael Mahoney

To address this problem, we present a new family of trust region based adversarial attacks, with the goal of computing adversarial perturbations efficiently.

Adversarial Attack

Parameter Re-Initialization through Cyclical Batch Size Schedules

no code implementations4 Dec 2018 Norman Mu, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael Mahoney

We demonstrate the ability of our method to improve language modeling performance by up to 7. 91 perplexity and reduce training iterations by up to $61\%$, in addition to its flexibility in enabling snapshot ensembling and use with adversarial training.

General Classification Image Classification +2

Large batch size training of neural networks with adversarial training and second-order information

1 code implementation ICLR 2019 Zhewei Yao, Amir Gholami, Daiyaan Arfeen, Richard Liaw, Joseph Gonzalez, Kurt Keutzer, Michael Mahoney

Our method exceeds the performance of existing solutions in terms of both accuracy and the number of SGD iterations (up to 1\% and $5\times$, respectively).

Second-order methods

Statistical and Algorithmic Perspectives on Randomized Sketching for Ordinary Least-Squares -- ICML

no code implementations25 May 2015 Garvesh Raskutti, Michael Mahoney

We then consider the statistical prediction efficiency (PE) and the statistical residual efficiency (RE) of the sketched LS estimator; and we use our framework to provide upper bounds for several types of random projection and random sampling algorithms.

Quasi-Monte Carlo Feature Maps for Shift-Invariant Kernels

no code implementations29 Dec 2014 Haim Avron, Vikas Sindhwani, Jiyan Yang, Michael Mahoney

These approximate feature maps arise as Monte Carlo approximations to integral representations of shift-invariant kernel functions (e. g., Gaussian kernel).

A Statistical Perspective on Randomized Sketching for Ordinary Least-Squares

no code implementations23 Jun 2014 Garvesh Raskutti, Michael Mahoney

Prior results show that, when using sketching matrices such as random projections and leverage-score sampling algorithms, with $p < r \ll n$, the WC error is the same as solving the original problem, up to a small constant.

Cannot find the paper you are looking for? You can Submit a new open access paper.