no code implementations • 12 Jan 2025 • J. Antonio Lara Benitez, Junyi Guo, Kareem Hegazy, Ivan Dokmanić, Michael W. Mahoney, Maarten V. de Hoop
We introduce Neural Discrete Equilibrium (NeurDE), a machine learning (ML) approach for long-term forecasting of flow phenomena that relies on a "lifting" of physical conservation laws into the framework of kinetic theory.
no code implementations • 10 Jan 2025 • Malcolm L. Wolff, Shenghao Yang, Kari Torkkola, Michael W. Mahoney
Pre-trained Large Language Models (LLMs) encapsulate large amounts of knowledge and take enormous amounts of compute to train.
no code implementations • 24 Dec 2024 • Siavash Ameli, Siyuan Zhuang, Ion Stoica, Michael W. Mahoney
Large language models (LLMs) have transformed natural language processing, with frameworks like Chatbot Arena providing pioneering platforms for evaluating these models.
no code implementations • 17 Dec 2024 • Tiankai Xie, Jiaqing Chen, Yaoqing Yang, Caleb Geniesse, Ge Shi, Ajinkya Chaudhari, John Kevin Cava, Michael W. Mahoney, Talita Perciano, Gunther H. Weber, Ross Maciejewski
Modern machine learning often relies on optimizing a neural network's parameters using a loss function to learn complex features.
no code implementations • 6 Dec 2024 • Luca Masserano, Abdul Fatir Ansari, Boran Han, Xiyuan Zhang, Christos Faloutsos, Michael W. Mahoney, Andrew Gordon Wilson, Youngsuk Park, Syama Rangapuram, Danielle C. Maddix, Yuyang Wang
To address this question, we develop WaveToken, a wavelet-based tokenizer that allows models to learn complex representations directly in the space of time-localized frequencies.
no code implementations • 3 Dec 2024 • Hanyu Zhang, Chuck Arvin, Dmitry Efimov, Michael W. Mahoney, Dominique Perrault-Joncas, Shankar Ramasubramanian, Andrew Gordon Wilson, Malcolm Wolff
Modern time-series forecasting models often fail to make full use of rich unstructured information about the time series themselves.
no code implementations • 2 Dec 2024 • Chaoran Cheng, Boran Han, Danielle C. Maddix, Abdul Fatir Ansari, Andrew Stuart, Michael W. Mahoney, Yuyang Wang
Generative models that satisfy hard constraints are crucial in many scientific and engineering applications where physical laws or system requirements must be strictly respected.
no code implementations • 19 Nov 2024 • Caleb Geniesse, Jiaqing Chen, Tiankai Xie, Ge Shi, Yaoqing Yang, Dmitriy Morozov, Talita Perciano, Michael W. Mahoney, Ross Maciejewski, Gunther H. Weber
After describing this new topological landscape profile representation, we show how the shape of loss landscapes can reveal new details about model performance and learning dynamics, highlighting several use cases, including image segmentation (e. g., UNet) and scientific machine learning (e. g., physics-informed neural networks).
1 code implementation • 14 Nov 2024 • Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Monishwaran Maheswaran, June Paik, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
During inference, we compare query tokens from the user input with the centroids to predict which of the keys from the fixed context are semantically relevant and need to be loaded during inference.
no code implementations • 14 Nov 2024 • Tiankai Xie, Caleb Geniesse, Jiaqing Chen, Yaoqing Yang, Dmitriy Morozov, Michael W. Mahoney, Ross Maciejewski, Gunther H. Weber
Characterizing the loss of a neural network with respect to model parameters, i. e., the loss landscape, can provide valuable insights into properties of that model.
no code implementations • 6 Nov 2024 • Malcolm Wolff, Kin G. Olivares, Boris Oreshkin, Sunny Ruan, Sitan Yang, Abhinav Katoch, Shankar Ramasubramanian, Youxin Zhang, Michael W. Mahoney, Dmitry Efimov, Vincent Quenneville-Bélair
Neural networks like MQCNN and MQT overreact to demand peaks by carrying over the elevated PE demand into subsequent Post-Peak-Event (PPE) periods, resulting in significantly over-biased forecasts.
no code implementations • 1 Nov 2024 • Hyunsuk Kim, Liam Hodgkinson, Ryan Theisen, Michael W. Mahoney
(2) The error of the majority vote classifier is considered under restricted entropy conditions, and we present a tight upper bound that indicates that the disagreement is linearly correlated with the target, and that the slope is linear in the polarization.
1 code implementation • 14 Oct 2024 • Haiquan Lu, Yefan Zhou, Shiwei Liu, Zhangyang Wang, Michael W. Mahoney, Yaoqing Yang
Existing LLM pruning strategies typically assign uniform pruning ratios across layers, limiting overall pruning ability; and recent work on layerwise pruning of LLMs is often based on heuristics that can easily lead to suboptimal performance.
no code implementations • 4 Oct 2024 • Soon Hoe Lim, Yijin Wang, Annan Yu, Emma Hart, Michael W. Mahoney, Xiaoye S. Li, N. Benjamin Erichson
Flow matching has recently emerged as a powerful paradigm for generative modeling and has been extended to probabilistic time series forecasting in latent spaces.
1 code implementation • 3 Oct 2024 • Mansi Sakarvadia, Aswathy Ajith, Arham Khan, Nathaniel Hudson, Caleb Geniesse, Kyle Chard, Yaoqing Yang, Ian Foster, Michael W. Mahoney
Language models (LMs) can "memorize" information, i. e., encode training data in their weights in such a way that inference-time queries can lead to verbatim regurgitation of that data.
no code implementations • 2 Oct 2024 • Annan Yu, Dongwei Lyu, Soon Hoe Lim, Michael W. Mahoney, N. Benjamin Erichson
State space models (SSMs) leverage linear, time-invariant (LTI) systems to effectively learn sequences with long-range dependencies.
no code implementations • 24 Sep 2024 • Yuchen Fang, Sen Na, Michael W. Mahoney, Mladen Kolar
To converge to first-order stationary points, our method computes a gradient step in each iteration defined by minimizing a quadratic approximation of the objective subject to a (relaxed) linear approximation of the problem constraints and a trust-region constraint.
1 code implementation • 21 Jul 2024 • Pu Ren, Rie Nakata, Maxime Lacour, Ilan Naiman, Nori Nakata, Jialin Song, Zhengfa Bi, Osman Asif Malik, Dmitriy Morozov, Omri Azencot, N. Benjamin Erichson, Michael W. Mahoney
Predicting high-fidelity ground motions for future earthquakes is crucial for seismic hazard assessment and infrastructure resilience.
1 code implementation • 19 Jul 2024 • Matthias Karlbauer, Danielle C. Maddix, Abdul Fatir Ansari, Boran Han, Gaurav Gupta, Yuyang Wang, Andrew Stuart, Michael W. Mahoney
Remarkable progress in the development of Deep Learning Weather Prediction (DLWP) models positions them to become competitive with traditional numerical weather prediction (NWP) models.
no code implementations • 17 Jul 2024 • Haiquan Lu, Xiaotian Liu, Yefan Zhou, Qunli Li, Kurt Keutzer, Michael W. Mahoney, Yujun Yan, Huanrui Yang, Yaoqing Yang
We discover a trade-off between sharpness and diversity: minimizing the sharpness in the loss landscape tends to diminish the diversity of individual members within the ensemble, adversely affecting the ensemble's improvement.
no code implementations • 27 Jun 2024 • Tommaso Baldi, Javier Campos, Ben Hawks, Jennifer Ngadiuba, Nhan Tran, Daniel Diaz, Javier Duarte, Ryan Kastner, Andres Meza, Melissa Quinnan, Olivia Weng, Caleb Geniesse, Amir Gholami, Michael W. Mahoney, Vladimir Loncar, Philip Harris, Joshua Agar, Shuyu Qin
Extreme data rate scientific experiments create massive amounts of data that require efficient ML edge processing.
no code implementations • 17 Jun 2024 • Michał Dereziński, Michael W. Mahoney
Large matrices arise in many machine learning and data analysis applications, including as representations of datasets, graphs, model weights, and first and second-order derivatives.
1 code implementation • 14 Jun 2024 • Konstantin Schürholt, Michael W. Mahoney, Damian Borth
Learning representations of well-trained neural network models holds the promise to provide an understanding of the inner workings of those models.
no code implementations • 30 May 2024 • Dongwei Lyu, Rie Nakata, Pu Ren, Michael W. Mahoney, Arben Pitarka, Nori Nakata, N. Benjamin Erichson
To improve early warning, we propose a novel AI-enabled framework, WaveCastNet, for forecasting ground motions from large earthquakes.
no code implementations • 22 May 2024 • Annan Yu, Michael W. Mahoney, N. Benjamin Erichson
To achieve state-of-the-art performance, an SSM often needs a specifically designed initialization, and the training of state matrices is on a logarithmic scale with a very small learning rate.
1 code implementation • 22 Mar 2024 • Nicholas Lee, Thanakul Wattanawong, Sehoon Kim, Karttikeya Mangalam, Sheng Shen, Gopala Anumanchipalli, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
To address this, we propose LLM2LLM, a targeted and iterative data augmentation strategy that uses a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used for fine-tuning on a specific task.
no code implementations • 21 Mar 2024 • Amir Gholami, Zhewei Yao, Sehoon Kim, Coleman Hooper, Michael W. Mahoney, Kurt Keutzer
The availability of unprecedented unsupervised training data, along with neural scaling laws, has resulted in an unprecedented surge in model size and compute requirements for serving/training LLMs.
1 code implementation • 15 Mar 2024 • S. Chandra Mouli, Danielle C. Maddix, Shima Alizadeh, Gaurav Gupta, Andrew Stuart, Michael W. Mahoney, Yuyang Wang
Existing work in scientific machine learning (SciML) has shown that data-driven learning of solution operators can provide a fast approximate alternative to classical numerical partial differential equation (PDE) solvers.
5 code implementations • 12 Mar 2024 • Abdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, Xiyuan Zhang, Pedro Mercado, Huibin Shen, Oleksandr Shchur, Syama Sundar Rangapuram, Sebastian Pineda Arango, Shubham Kapoor, Jasper Zschiegner, Danielle C. Maddix, Hao Wang, Michael W. Mahoney, Kari Torkkola, Andrew Gordon Wilson, Michael Bohlke-Schneider, Yuyang Wang
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models.
1 code implementation • 24 Feb 2024 • Wuyang Chen, Jialin Song, Pu Ren, Shashank Subramanian, Dmitriy Morozov, Michael W. Mahoney
To reduce the need for training data with heavy simulation costs, we mine unlabeled PDE data without simulated solutions, and we pretrain neural operators with physics-inspired reconstruction-based proxy tasks.
2 code implementations • 31 Jan 2024 • Coleman Hooper, Sehoon Kim, Hiva Mohammadzadeh, Michael W. Mahoney, Yakun Sophia Shao, Kurt Keutzer, Amir Gholami
LLMs are seeing growing use for applications which require large context windows, and with these large context windows KV cache activations surface as the dominant contributor to memory consumption during inference.
no code implementations • 30 Dec 2023 • Ali Eshragh, Luke Yerbury, Asef Nazari, Fred Roosta, Michael W. Mahoney
We demonstrate that, with high probability, the accuracy of SALSA's approximations is within $(1 + O({\varepsilon}))$ of the true leverage scores.
1 code implementation • 7 Dec 2023 • Sehoon Kim, Suhong Moon, Ryan Tabrizi, Nicholas Lee, Michael W. Mahoney, Kurt Keutzer, Amir Gholami
To address this, we introduce LLMCompiler, which executes functions in parallel to efficiently orchestrate multiple function calls.
1 code implementation • NeurIPS 2023 • Yefan Zhou, Tianyu Pang, Keqin Liu, Charles H. Martin, Michael W. Mahoney, Yaoqing Yang
In particular, the learning rate, which can be interpreted as a temperature-like parameter within the statistical mechanics of learning, plays a crucial role in neural network training.
no code implementations • 21 Nov 2023 • Luis Oala, Manil Maskey, Lilith Bat-Leah, Alicia Parrish, Nezihe Merve Gürel, Tzu-Sheng Kuo, Yang Liu, Rotem Dror, Danilo Brajovic, Xiaozhe Yao, Max Bartolo, William A Gaviria Rojas, Ryan Hileman, Rainier Aliment, Michael W. Mahoney, Meg Risdal, Matthew Lease, Wojciech Samek, Debojyoti Dutta, Curtis G Northcutt, Cody Coleman, Braden Hancock, Bernard Koch, Girmaw Abebe Tadesse, Bojan Karlaš, Ahmed Alaa, Adji Bousso Dieng, Natasha Noy, Vijay Janapa Reddi, James Zou, Praveen Paritosh, Mihaela van der Schaar, Kurt Bollacker, Lora Aroyo, Ce Zhang, Joaquin Vanschoren, Isabelle Guyon, Peter Mattson
Drawing from discussions at the inaugural DMLR workshop at ICML 2023 and meetings prior, in this report we outline the relevance of community engagement and infrastructure development for the creation of next-generation public datasets that will advance machine learning science.
no code implementations • 13 Nov 2023 • Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
Deep learning is renowned for its theory-practice gap, whereby principled theory typically fails to provide much beneficial guidance for implementation in practice.
1 code implementation • 9 Oct 2023 • Da Long, Wei W. Xing, Aditi S. Krishnapriyan, Robert M. Kirby, Shandian Zhe, Michael W. Mahoney
To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra to enable efficient computation and optimization.
1 code implementation • 4 Oct 2023 • Ilan Naiman, N. Benjamin Erichson, Pu Ren, Michael W. Mahoney, Omri Azencot
In this work, we introduce Koopman VAE (KoVAE), a new generative framework that is based on a novel design for the model prior, and that can be optimized for either regular and irregular training data.
no code implementations • 2 Oct 2023 • Annan Yu, Arnur Nigmetov, Dmitriy Morozov, Michael W. Mahoney, N. Benjamin Erichson
An example is the structured state-space sequence (S4) layer, which uses the diagonal-plus-low-rank structure of the HiPPO initialization framework.
1 code implementation • 30 Aug 2023 • Younghyun Cho, James W. Demmel, Michał Dereziński, Haoyun Li, Hengrui Luo, Michael W. Mahoney, Riley J. Murray
Algorithms from Randomized Numerical Linear Algebra (RandNLA) are known to be effective in handling high-dimensional computational problems, providing high-quality empirical performance as well as strong probabilistic guarantees.
no code implementations • 19 Jul 2023 • Kin G. Olivares, Geoffrey Négiar, Ruijun Ma, O. Nangba Meetei, Mengfei Cao, Michael W. Mahoney
The factor model samples can be differentiated with respect to the model parameters, allowing optimization on arbitrary differentiable learning objectives that align with the forecasting system's goals, including quantile loss and the scaled Continuous Ranked Probability Score (CRPS).
no code implementations • 15 Jul 2023 • Liam Hodgkinson, Chris van der Heide, Robert Salomone, Fred Roosta, Michael W. Mahoney
The problem of model selection is considered for the setting of interpolating estimators, where the number of model parameters exceeds the size of the dataset.
no code implementations • 7 Jul 2023 • Sitan Yang, Malcolm Wolff, Shankar Ramasubramanian, Vincent Quenneville-Belair, Ronak Metha, Michael W. Mahoney
Encoder-decoder deep neural networks have been increasingly studied for multi-horizon time series forecasting, especially in real-world applications.
1 code implementation • 24 Jun 2023 • Pu Ren, N. Benjamin Erichson, Shashank Subramanian, Omer San, Zarija Lukic, Michael W. Mahoney
Super-Resolution (SR) techniques aim to enhance data resolution, enabling the retrieval of finer details, and improving the overall quality and fidelity of the data representation.
3 code implementations • 13 Jun 2023 • Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, Kurt Keutzer
When applied to the LLaMA models, our 3-bit quantization significantly reduces the perplexity gap from the FP16 baseline by up to 2. 1x as compared to the state-of-the-art methods with the same memory requirement.
1 code implementation • 28 May 2023 • Ilgee Hong, Sen Na, Michael W. Mahoney, Mladen Kolar
Our method adaptively controls the accuracy of the randomized solver and the penalty parameters of the exact augmented Lagrangian, to ensure that the inexact Newton direction is a descent direction of the exact augmented Lagrangian.
1 code implementation • 28 May 2023 • Yefan Zhou, Yaoqing Yang, Arin Chang, Michael W. Mahoney
Our approach uses temperature-like and load-like parameters to model the impact of neural network (NN) training hyperparameters on pruning performance.
no code implementations • 13 Apr 2023 • Javier Campos, Zhen Dong, Javier Duarte, Amir Gholami, Michael W. Mahoney, Jovan Mitrevski, Nhan Tran
We develop an end-to-end workflow for the training and implementation of co-designed neural networks (NNs) for efficient field-programmable gate array (FPGA) and application-specific integrated circuit (ASIC) hardware.
no code implementations • 27 Feb 2023 • Sehoon Kim, Coleman Hooper, Thanakul Wattanawong, Minwoo Kang, Ruohan Yan, Hasan Genc, Grace Dinh, Qijing Huang, Kurt Keutzer, Michael W. Mahoney, Yakun Sophia Shao, Amir Gholami
In this work, we survey different approaches for efficient Transformer inference, including: (i) analysis and profiling of the bottlenecks in existing Transformer architectures and their similarities and differences with previous convolutional models; (ii) implications of Transformer architecture on hardware, including the impact of non-linear operations such as Layer Normalization, Softmax, and GELU, as well as linear operations, on hardware design; (iii) approaches for optimizing a fixed Transformer architecture; (iv) challenges in finding the right mapping and scheduling of operations for Transformer models; and (v) approaches for optimizing Transformer models by adapting the architecture using neural architecture search.
1 code implementation • 21 Feb 2023 • Derek Hansen, Danielle C. Maddix, Shima Alizadeh, Gaurav Gupta, Michael W. Mahoney
We provide a detailed analysis of ProbConserv on learning with the Generalized Porous Medium Equation (GPME), a widely-applicable parameterized family of PDEs that illustrates the qualitative properties of both easier and harder PDEs.
1 code implementation • NeurIPS 2023 • Sehoon Kim, Karttikeya Mangalam, Suhong Moon, Jitendra Malik, Michael W. Mahoney, Amir Gholami, Kurt Keutzer
To address this, we propose Big Little Decoder (BiLD), a framework that can improve inference efficiency and latency for a wide range of text generation applications.
no code implementations • 1 Dec 2022 • N. Benjamin Erichson, Soon Hoe Lim, Michael W. Mahoney
We prove the existence and uniqueness of solutions for the continuous-time model, and we demonstrate that the proposed feedback mechanism can help improve the modeling of long-term dependencies.
1 code implementation • 29 Nov 2022 • Yuchen Fang, Sen Na, Michael W. Mahoney, Mladen Kolar
We propose a trust-region stochastic sequential quadratic programming algorithm (TR-StoSQP) to solve nonlinear optimization problems with stochastic objectives and deterministic equality constraints.
no code implementations • 14 Oct 2022 • Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney
One prominent issue is the curse of dimensionality: it is commonly believed that the marginal likelihood should be reminiscent of cross-validation metrics and that both should deteriorate with larger input dimensions.
1 code implementation • 2 Oct 2022 • T. Konstantin Rusch, Benjamin P. Chamberlain, Michael W. Mahoney, Michael M. Bronstein, Siddhartha Mishra
We present Gradient Gating (G$^2$), a novel framework for improving the performance of Graph Neural Networks (GNNs).
Ranked #3 on Node Classification on arXiv-year
no code implementations • 18 Jul 2022 • Geoffrey Négiar, Michael W. Mahoney, Aditi S. Krishnapriyan
Our method leverages differentiable optimization and the implicit function theorem to effectively enforce physical constraints.
1 code implementation • 8 Jul 2022 • Shashank Subramanian, Robert M. Kirby, Michael W. Mahoney, Amir Gholami
We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points.
2 code implementations • 12 Jun 2022 • Zhengming Zhang, Ashwinee Panda, Linyue Song, Yaoqing Yang, Michael W. Mahoney, Joseph E. Gonzalez, Kannan Ramchandran, Prateek Mittal
In this type of attack, the goal of the attacker is to use poisoned updates to implant so-called backdoors into the learned model such that, at test time, the model's outputs can be fixed to a given target for certain inputs.
4 code implementations • 2 Jun 2022 • Sehoon Kim, Amir Gholami, Albert Shaw, Nicholas Lee, Karttikeya Mangalam, Jitendra Malik, Michael W. Mahoney, Kurt Keutzer
After re-examining the design choices for both the macro and micro-architecture of Conformer, we propose Squeezeformer which consistently outperforms the state-of-the-art ASR models under the same training schemes.
Ranked #38 on Speech Recognition on LibriSpeech test-clean
Automatic Speech Recognition Automatic Speech Recognition (ASR)
1 code implementation • 27 May 2022 • Sen Na, Michael W. Mahoney
To reduce dominant computational cost of the method, we inexactly solve the quadratic program in each iteration by employing an iterative sketching solver.
no code implementations • 16 May 2022 • Feynman Liang, Liam Hodgkinson, Michael W. Mahoney
While fat-tailed densities commonly arise as posterior and marginal distributions in robust models and scale mixtures, they present challenges when Gaussian-based variational inference fails to capture tail decay accurately.
1 code implementation • 20 Apr 2022 • Sen Na, Michał Dereziński, Michael W. Mahoney
Remarkably, we show that there exists a universal weighted averaging scheme that transitions to local convergence at an optimal stage, and still exhibits a superlinear convergence rate nearly (up to a logarithmic factor) matching that of uniform Hessian averaging.
2 code implementations • 29 Mar 2022 • Woosuk Kwon, Sehoon Kim, Michael W. Mahoney, Joseph Hassoun, Kurt Keutzer, Amir Gholami
To address this, we propose a fast post-training pruning framework for Transformers that does not require any retraining.
no code implementations • 28 Feb 2022 • Francesco Quinzan, Rajiv Khanna, Moshik Hershcovitch, Sarel Cohen, Daniel G. Waddington, Tobias Friedrich, Michael W. Mahoney
We study the fundamental problem of selecting optimal features for model construction.
no code implementations • 17 Feb 2022 • Aditi S. Krishnapriyan, Alejandro F. Queiruga, N. Benjamin Erichson, Michael W. Mahoney
Dynamical systems that evolve continuously over time are ubiquitous throughout science and engineering.
1 code implementation • 6 Feb 2022 • Yaoqing Yang, Ryan Theisen, Liam Hodgkinson, Joseph E. Gonzalez, Kannan Ramchandran, Charles H. Martin, Michael W. Mahoney
Our analyses consider (I) hundreds of Transformers trained in different settings, in which we systematically vary the amount of data, the model size and the optimization hyperparameters, (II) a total of 51 pretrained Transformers from eight families of Huggingface NLP models, including GPT2, BERT, etc., and (III) a total of 28 existing and novel generalization metrics.
no code implementations • 2 Feb 2022 • N. Benjamin Erichson, Soon Hoe Lim, Winnie Xu, Francisco Utrera, Ziang Cao, Michael W. Mahoney
For many real-world applications, obtaining stable and robust statistical performance is more important than simply achieving state-of-the-art predictive test accuracy, and thus robustness of neural networks is an increasingly important topic.
no code implementations • 27 Nov 2021 • Luca Pion-Tonachini, Kristofer Bouchard, Hector Garcia Martin, Sean Peisert, W. Bradley Holtz, Anil Aswani, Dipankar Dwivedi, Haruko Wainwright, Ghanshyam Pilania, Benjamin Nachman, Babetta L. Marrone, Nicola Falco, Prabhat, Daniel Arnold, Alejandro Wolf-Yadlin, Sarah Powers, Sharlee Climer, Quinn Jackson, Ty Carlson, Michael Sohn, Petrus Zwart, Neeraj Kumar, Amy Justice, Claire Tomlin, Daniel Jacobson, Gos Micklem, Georgios V. Gkoutos, Peter J. Bickel, Jean-Baptiste Cazier, Juliane Müller, Bobbie-Jo Webb-Robertson, Rick Stevens, Mark Anderson, Ken Kreutz-Delgado, Michael W. Mahoney, James B. Brown
We outline emerging opportunities and challenges to enhance the utility of AI for scientific discovery.
1 code implementation • ICLR 2022 • T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael W. Mahoney
We propose a novel method called Long Expressive Memory (LEM) for learning long-term sequential dependencies.
Ranked #1 on Time Series Classification on EigenWorms
2 code implementations • ICLR 2022 • Soon Hoe Lim, N. Benjamin Erichson, Francisco Utrera, Winnie Xu, Michael W. Mahoney
We introduce Noisy Feature Mixup (NFM), an inexpensive yet effective method for data augmentation that combines the best of interpolation based training and noise injection schemes.
no code implementations • ICLR 2022 • Majid Jahani, Sergey Rusakov, Zheng Shi, Peter Richtárik, Michael W. Mahoney, Martin Takáč
We present a novel adaptive optimization algorithm for large-scale machine learning problems.
1 code implementation • 8 Sep 2021 • Sheng Shen, Zhewei Yao, Douwe Kiela, Kurt Keutzer, Michael W. Mahoney
Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29. 45/17. 29 BLEU on IWSLT14/WMT14.
2 code implementations • NeurIPS 2021 • Aditi S. Krishnapriyan, Amir Gholami, Shandian Zhe, Robert M. Kirby, Michael W. Mahoney
We provide evidence that the soft regularization in PINNs, which involves PDE-based differential operators, can introduce a number of subtle problems, including making the problem more ill-conditioned.
no code implementations • 2 Aug 2021 • Liam Hodgkinson, Umut Şimşekli, Rajiv Khanna, Michael W. Mahoney
Despite the ubiquitous use of stochastic optimization algorithms in machine learning, the precise impact of these algorithms and their dynamics on generalization performance in realistic non-convex settings is still poorly understood.
1 code implementation • NeurIPS 2021 • Yaoqing Yang, Liam Hodgkinson, Ryan Theisen, Joe Zou, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper.
1 code implementation • NeurIPS 2021 • Michał Dereziński, Jonathan Lacotte, Mert Pilanci, Michael W. Mahoney
In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration.
3 code implementations • NeurIPS 2021 • Alejandro Queiruga, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney
The recently-introduced class of ordinary differential equation networks (ODE-Nets) establishes a fruitful connection between deep learning and dynamical systems.
no code implementations • 1 Jun 2021 • Charles H. Martin, Michael W. Mahoney
Our results highlight the subtlety of comparing models when both architectures and hyperparameters are varied; the complementary role of implicit scale versus implicit shape parameters in understanding NN model quality; and the need to go beyond one-size-fits-all metrics based on upper bounds from generalization theory to describe the performance of NN models.
1 code implementation • 30 May 2021 • Zhewei Yao, Xiaoxia Wu, Linjian Ma, Sheng Shen, Kurt Keutzer, Michael W. Mahoney, Yuxiong He
Moreover, in order to reduce hyperparameter tuning, a novel adaptive regularization coefficient is deployed to control the regularization penalty adaptively.
4 code implementations • 29 Apr 2021 • Jianfei Chen, Lianmin Zheng, Zhewei Yao, Dequan Wang, Ion Stoica, Michael W. Mahoney, Joseph E. Gonzalez
On all these tasks, ActNN compresses the activation to 2 bits on average, with negligible accuracy loss.
1 code implementation • 31 Mar 2021 • Sehoon Kim, Amir Gholami, Zhewei Yao, Nicholas Lee, Patrick Wang, Aniruddha Nrusimha, Bohan Zhai, Tianren Gao, Michael W. Mahoney, Kurt Keutzer
End-to-end neural network models achieve improved performance on various automatic speech recognition (ASR) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 25 Mar 2021 • Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
Thus, it is not surprising that quantization has emerged recently as an important and very active sub-area of research in the efficient implementation of computations associated with Neural Networks.
no code implementations • NeurIPS 2021 • Zhenyu Liao, Michael W. Mahoney
Given an optimization problem, the Hessian matrix and its eigenspectrum can be used in many ways, ranging from designing more efficient second-order algorithms to performing model analysis and regression diagnostics.
no code implementations • 18 Feb 2021 • Omri Azencot, N. Benjamin Erichson, Mirela Ben-Chen, Michael W. Mahoney
In this work, we employ tools and insights from differential geometry to offer a novel perspective on orthogonal RNNs.
1 code implementation • NeurIPS 2021 • Soon Hoe Lim, N. Benjamin Erichson, Liam Hodgkinson, Michael W. Mahoney
We provide a general framework for studying recurrent neural networks (RNNs) trained by injecting noise into hidden states.
6 code implementations • 5 Jan 2021 • Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
Transformer based models, like BERT and RoBERTa, have achieved state-of-the-art results in many Natural Language Processing tasks.
Natural Language Inference Natural Language Understanding +1
no code implementations • NeurIPS 2020 • Michal Derezinski, Rajiv Khanna, Michael W. Mahoney
The Column Subset Selection Problem (CSSP) and the Nystrom method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing.
no code implementations • 21 Nov 2020 • Michał Dereziński, Zhenyu Liao, Edgar Dobriban, Michael W. Mahoney
For a tall $n\times d$ matrix $A$ and a random $m\times n$ sketching matrix $S$, the sketched estimate of the inverse covariance matrix $(A^\top A)^{-1}$ is typically biased: $E[(\tilde A^\top\tilde A)^{-1}]\ne(A^\top A)^{-1}$, where $\tilde A=SA$.
1 code implementation • 20 Nov 2020 • Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, Kurt Keutzer
Current low-precision quantization algorithms often have the hidden cost of conversion back and forth from floating point to quantized integer values.
2 code implementations • NeurIPS 2020 • Jianfei Chen, Yu Gai, Zhewei Yao, Michael W. Mahoney, Joseph E. Gonzalez
We show that the FQT gradient is an unbiased estimator of the QAT gradient, and we discuss the impact of gradient quantization on its variance.
Ranked #9 on Semantic Textual Similarity on STS Benchmark
no code implementations • 18 Oct 2020 • Vipul Gupta, Dhruv Choudhary, Ping Tak Peter Tang, Xiaohan Wei, Xing Wang, Yuzhen Huang, Arun Kejariwal, Kannan Ramchandran, Michael W. Mahoney
This is done by identifying and updating only the most relevant neurons of the neural network for each training sample in the data.
1 code implementation • EMNLP 2020 • Qinxin Wang, Hao Tan, Sheng Shen, Michael W. Mahoney, Zhewei Yao
Phrase localization is a task that studies the mapping from textual phrases to regions of an image.
no code implementations • ICLR 2021 • Zhenyu Liao, Romain Couillet, Michael W. Mahoney
Given a large data matrix, sparsifying, quantizing, and/or performing other entry-wise nonlinear operations can have numerous benefits, ranging from speeding up iterative algorithms for core numerical linear algebra problems to providing nonlinear filters to design state-of-the-art neural network models.
1 code implementation • 26 Aug 2020 • Zhengming Zhang, Yaoqing Yang, Zhewei Yao, Yujun Yan, Joseph E. Gonzalez, Michael W. Mahoney
Replacing BN with the recently-proposed Group Normalization (GN) can reduce gradient diversity and improve test accuracy.
4 code implementations • 5 Aug 2020 • Alejandro F. Queiruga, N. Benjamin Erichson, Dane Taylor, Michael W. Mahoney
We first show that ResNets fail to be meaningful dynamical integrators in this richer sense.
no code implementations • 31 Jul 2020 • N. Benjamin Erichson, Dane Taylor, Qixuan Wu, Michael W. Mahoney
The ubiquity of deep neural networks (DNNs), cloud-based training, and transfer learning is giving rise to a new cybersecurity frontier in which unsecure DNNs have `structural malware' (i. e., compromised weights and activation pathways).
1 code implementation • ICLR 2021 • Francisco Utrera, Evan Kravitz, N. Benjamin Erichson, Rajiv Khanna, Michael W. Mahoney
Transfer learning has emerged as a powerful methodology for adapting pre-trained deep neural networks on image recognition tasks to new domains.
1 code implementation • NeurIPS 2020 • Yaoqing Yang, Rajiv Khanna, Yaodong Yu, Amir Gholami, Kurt Keutzer, Joseph E. Gonzalez, Kannan Ramchandran, Michael W. Mahoney
Using these observations, we show that noise-augmentation on mixup training further increases boundary thickness, thereby combating vulnerability to various forms of adversarial attacks and OOD transforms.
no code implementations • NeurIPS 2020 • Michał Dereziński, Burak Bartan, Mert Pilanci, Michael W. Mahoney
In distributed second order optimization, a standard strategy is to average many local estimates, each of which is based on a small sketch or batch of the data.
1 code implementation • ICLR 2021 • N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney
Viewing recurrent neural networks (RNNs) as continuous-time dynamical systems, we propose a recurrent unit that describes the hidden state's evolution with two parts: a well-understood linear component plus a Lipschitz nonlinearity.
no code implementations • 22 Jun 2020 • Ryan Theisen, Jason M. Klusowski, Michael W. Mahoney
Inspired by the statistical mechanics approach to learning, we formally define and develop a methodology to compute precisely the full distribution of test errors among interpolating classifiers from several model classes.
no code implementations • NeurIPS 2020 • Michał Dereziński, Feynman Liang, Zhenyu Liao, Michael W. Mahoney
It is often desirable to reduce the dimensionality of a large dataset by projecting it onto a low-dimensional subspace.
no code implementations • 11 Jun 2020 • Liam Hodgkinson, Michael W. Mahoney
Although stochastic optimization is central to modern machine learning, the precise mechanisms underlying its success, and in particular, the precise role of the stochasticity, still remain unclear.
no code implementations • NeurIPS 2020 • Zhenyu Liao, Romain Couillet, Michael W. Mahoney
This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable.
4 code implementations • 1 Jun 2020 • Zhewei Yao, Amir Gholami, Sheng Shen, Mustafa Mustafa, Kurt Keutzer, Michael W. Mahoney
We introduce ADAHESSIAN, a second order stochastic optimization algorithm which dynamically incorporates the curvature of the loss function via ADAptive estimates of the HESSIAN.
no code implementations • 7 May 2020 • Michał Dereziński, Michael W. Mahoney
For example, random sampling with a DPP leads to new kinds of unbiased estimators for least squares, enabling more refined statistical and inferential understanding of these algorithms; a DPP is, in some sense, an optimal randomized algorithm for the Nystr\"om method; and a RandNLA technique called leverage score sampling can be derived as the marginal distribution of a DPP.
1 code implementation • ICML 2020 • Sheng Shen, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
To address this, we propose Power Normalization (PN), a novel normalization scheme that resolves this issue by (i) relaxing zero-mean normalization in BN, (ii) incorporating a running quadratic mean instead of per batch statistics to stabilize fluctuations, and (iii) using an approximate backpropagation for incorporating the running statistics in the forward pass.
Ranked #12 on Machine Translation on WMT2014 English-German
no code implementations • 10 Mar 2020 • Miles E. Lopes, N. Benjamin Erichson, Michael W. Mahoney
In order to compute fast approximations to the singular value decompositions (SVD) of very large matrices, randomized sketching algorithms have become a leading approach.
1 code implementation • ICML 2020 • Omri Azencot, N. Benjamin Erichson, Vanessa Lin, Michael W. Mahoney
Recurrent neural networks are widely used on time series data, yet such models often ignore the underlying physical structures in such sequences.
no code implementations • 24 Feb 2020 • Ping Ma, Xinlian Zhang, Xin Xing, Jingyi Ma, Michael W. Mahoney
In this article, we develop an asymptotic analysis to derive the distribution of RandNLA sampling estimators for the least-squares problem.
no code implementations • NeurIPS 2020 • Liam Hodgkinson, Chris van der Heide, Fred Roosta, Michael W. Mahoney
We introduce stochastic normalizing flows, an extension of continuous normalizing flows for maximum likelihood estimation and variational inference (VI) using stochastic differential equations (SDEs).
no code implementations • 21 Feb 2020 • Michał Dereziński, Rajiv Khanna, Michael W. Mahoney
The Column Subset Selection Problem (CSSP) and the Nystr\"om method are among the leading tools for constructing small low-rank approximations of large datasets in machine learning and scientific computing.
1 code implementation • 17 Feb 2020 • Charles H. Martin, Tongsu, Peng, Michael W. Mahoney
We find that norm based metrics correlate well with reported test accuracies for well-trained models, but that they often cannot distinguish well-trained versus poorly-trained models.
3 code implementations • CVPR 2020 • Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
Importantly, ZeroQ has a very low computational overhead, and it can finish the entire quantization process in less than 30s (0. 5\% of one epoch training time of ResNet50 on ImageNet).
Ranked #1 on Data Free Quantization on CIFAR10 (CIFAR-10 W8A8 Top-1 Accuracy metric)
no code implementations • NeurIPS 2020 • Michał Dereziński, Feynman Liang, Michael W. Mahoney
We provide the first exact non-asymptotic expressions for double descent of the minimum norm linear estimator.
1 code implementation • NeurIPS 2019 • Tianjun Zhang, Zhewei Yao, Amir Gholami, Joseph E. Gonzalez, Kurt Keutzer, Michael W. Mahoney, George Biros
It has been observed that residual networks can be viewed as the explicit Euler discretization of an Ordinary Differential Equation (ODE).
no code implementations • 27 Nov 2019 • Ali Eshragh, Fred Roosta, Asef Nazari, Michael W. Mahoney
We first develop a new fast algorithm to estimate the leverage scores of an autoregressive (AR) model in big data regimes.
2 code implementations • NeurIPS 2020 • Zhen Dong, Zhewei Yao, Yaohui Cai, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
However, the search space for a mixed-precision quantization is exponential in the number of layers.
no code implementations • 29 Sep 2019 • Keith Levin, Fred Roosta, Minh Tang, Michael W. Mahoney, Carey E. Priebe
In both cases, we prove that when the underlying graph is generated according to a latent space model called the random dot product graph, which includes the popular stochastic block model as a special case, an out-of-sample extension based on a least-squares objective obeys a central limit theorem about the true latent position of the out-of-sample vertex.
no code implementations • 12 Sep 2019 • Sheng Shen, Zhen Dong, Jiayu Ye, Linjian Ma, Zhewei Yao, Amir Gholami, Michael W. Mahoney, Kurt Keutzer
In particular, we propose a new group-wise quantization scheme, and we use a Hessian based mix-precision method to compress the model further.
Ranked #13 on Semantic Textual Similarity on STS Benchmark
no code implementations • 19 Jul 2019 • Rajiv Khanna, Liam Hodgkinson, Michael W. Mahoney
The rate of convergence of weighted kernel herding (WKH) and sequential Bayesian quadrature (SBQ), two kernel-based sampling algorithms for estimating integrals with respect to some target probability measure, is investigated.
no code implementations • 11 Jun 2019 • Wooseok Ha, Kimon Fountoulakis, Michael W. Mahoney
In this paper, we adopt a statistical perspective on local graph clustering, and we analyze the performance of the l1-regularized PageRank method~(Fountoulakis et.
1 code implementation • 10 Jun 2019 • Michał Dereziński, Feynman Liang, Michael W. Mahoney
In experimental design, we are given $n$ vectors in $d$ dimensions, and our goal is to select $k\ll n$ of them to perform expensive measurements, e. g., to obtain labels/responses, for a linear regression task.
no code implementations • 31 May 2019 • Kai Rothauge, Zhewei Yao, Zixi Hu, Michael W. Mahoney
We regard pre-trained residual networks (ResNets) as nonlinear systems and use linearization, a common method used in the qualitative analysis of nonlinear systems, to understand the behavior of the networks under small perturbations of the input images.
no code implementations • NeurIPS 2019 • Michał Dereziński, Michael W. Mahoney
In distributed optimization and distributed numerical linear algebra, we often encounter an inversion bias: if we want to compute a quantity that depends on the inverse of a sum of distributed matrices, then the sum of the inverses does not equal the inverse of the sum.
no code implementations • 26 May 2019 • N. Benjamin Erichson, Michael Muehlebach, Michael W. Mahoney
In addition to providing high-profile successes in computer vision and natural language processing, neural networks also provide an emerging set of techniques for scientific problems.
no code implementations • ICLR 2019 • Charles H. Martin, Michael W. Mahoney
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.
1 code implementation • 7 Apr 2019 • N. Benjamin Erichson, Zhewei Yao, Michael W. Mahoney
To complement these approaches, we propose a very simple and inexpensive strategy which can be used to ``retrofit'' a previously-trained network to improve its resilience to adversarial attacks.
1 code implementation • 21 Mar 2019 • Vipul Gupta, Swanand Kadhe, Thomas Courtade, Michael W. Mahoney, Kannan Ramchandran
Motivated by recent developments in serverless systems for large-scale computation as well as improvements in scalable randomized matrix algorithms, we develop OverSketched Newton, a randomized Hessian-based optimization algorithm to solve large-scale convex optimization problems in serverless systems.
no code implementations • 14 Mar 2019 • Linjian Ma, Gabe Montague, Jiayu Ye, Zhewei Yao, Amir Gholami, Kurt Keutzer, Michael W. Mahoney
In stochastic optimization, using large batch sizes during training can leverage parallel resources to produce faster wall-clock training times per training epoch.
1 code implementation • 20 Feb 2019 • N. Benjamin Erichson, Lionel Mathelin, Zhewei Yao, Steven L. Brunton, Michael W. Mahoney, J. Nathan Kutz
In many applications, it is important to reconstruct a fluid flow field, or some other high-dimensional state, from limited measurements and limited data.
no code implementations • 4 Feb 2019 • Michał Dereziński, Kenneth L. Clarkson, Michael W. Mahoney, Manfred K. Warmuth
In the process, we develop a new algorithm for a joint sampling distribution called volume sampling, and we propose a new i. i. d.
2 code implementations • 24 Jan 2019 • Charles H. Martin, Michael W. Mahoney
Random Matrix Theory (RMT) is applied to analyze the weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.
no code implementations • 24 Jan 2019 • Charles H. Martin, Michael W. Mahoney
In this paper, we show how to use a new Theory of Heavy-Tailed Self-Regularization (HT-SR) to answer this.
no code implementations • 30 Nov 2018 • Noah Golmant, Nikita Vemuri, Zhewei Yao, Vladimir Feinberg, Amir Gholami, Kai Rothauge, Michael W. Mahoney, Joseph Gonzalez
Increasing the mini-batch size for stochastic gradient descent offers significant opportunities to reduce wall-clock training time, but there are a variety of theoretical and systems challenges that impede the widespread success of this technique.
1 code implementation • 17 Oct 2018 • Kimon Fountoulakis, David F. Gleich, Michael W. Mahoney
Scalability problems led to the development of local graph clustering algorithms that come with a variety of theoretical guarantees.
Social and Information Networks
3 code implementations • 2 Oct 2018 • Charles H. Martin, Michael W. Mahoney
Random Matrix Theory (RMT) is applied to analyze weight matrices of Deep Neural Networks (DNNs), including both production quality, pre-trained models such as AlexNet and Inception, and smaller models trained from scratch, such as LeNet5 and a miniature-AlexNet.
no code implementations • 30 Sep 2018 • Fred Roosta, Yang Liu, Peng Xu, Michael W. Mahoney
We consider a variant of inexact Newton Method, called Newton-MR, in which the least-squares sub-problems are solved approximately using Minimum Residual method.
1 code implementation • 18 Jul 2018 • Chih-Hao Fang, Sudhir B. Kylasa, Fred Roosta, Michael W. Mahoney, Ananth Grama
First-order optimization methods, such as stochastic gradient descent (SGD) and its variants, are widely used in machine learning applications due to their simplicity and low per-iteration costs.
no code implementations • ICML 2018 • Miles E. Lopes, Shusen Wang, Michael W. Mahoney
As a more practical alternative, we propose a bootstrap method to compute a posteriori error estimates for randomized LS algorithms.
no code implementations • 26 Feb 2018 • Sudhir B. Kylasa, Farbod Roosta-Khorasani, Michael W. Mahoney, Ananth Grama
In particular, in convex settings, we consider variants of classical Newton\textsf{'}s method in which the Hessian and/or the gradient are randomly sub-sampled.
6 code implementations • NeurIPS 2018 • Zhewei Yao, Amir Gholami, Qi Lei, Kurt Keutzer, Michael W. Mahoney
Extensive experiments on multiple networks show that saddle-points are not the cause for generalization gap of large batch size training, and the results consistently show that large batch converges to points with noticeably higher Hessian spectrum.
no code implementations • ICML 2018 • Keith Levin, Farbod Roosta-Khorasani, Michael W. Mahoney, Carey E. Priebe
Many popular dimensionality reduction procedures have out-of-sample extensions, which allow a practitioner to apply a learned embedding to observations not seen in the initial training sample.
1 code implementation • 24 Dec 2017 • Petros Drineas, Michael W. Mahoney
This chapter is based on lectures on Randomized Numerical Linear Algebra from the 2016 Park City Mathematics Institute summer school on The Mathematics of Data.
no code implementations • 17 Dec 2017 • Aditya Devarakonda, Kimon Fountoulakis, James Demmel, Michael W. Mahoney
Parallel computing has played an important role in speeding up convex optimization methods for big data analytics and large-scale machine learning (ML).
no code implementations • 15 Dec 2017 • Ion Stoica, Dawn Song, Raluca Ada Popa, David Patterson, Michael W. Mahoney, Randy Katz, Anthony D. Joseph, Michael Jordan, Joseph M. Hellerstein, Joseph E. Gonzalez, Ken Goldberg, Ali Ghodsi, David Culler, Pieter Abbeel
With the increasing commoditization of computer vision, speech recognition and machine translation systems and the widespread deployment of learning-based back-end technologies such as digital advertising and intelligent infrastructures, AI (Artificial Intelligence) has moved from research labs to production.
no code implementations • ICLR 2018 • Charles H. Martin, Michael W. Mahoney
Using this model, we describe how a very simple application of ideas from the statistical mechanics theory of generalization provides a strong qualitative description of recently-observed empirical results regarding the inability of deep neural networks not to overfit training data, discontinuous learning and sharp transitions in the generalization properties of learning algorithms, etc.
no code implementations • 17 Oct 2017 • Evgeniy Faerman, Felix Borutta, Kimon Fountoulakis, Michael W. Mahoney
For larger graphs with flat NCPs that are strongly expander-like, existing methods lead to random walks that expand rapidly, touching many dissimilar nodes, thereby leading to lower-quality vector representations that are less useful for downstream tasks.
no code implementations • NeurIPS 2018 • Shusen Wang, Farbod Roosta-Khorasani, Peng Xu, Michael W. Mahoney
For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method.
no code implementations • 25 Aug 2017 • Peng Xu, Farbod Roosta-Khorasani, Michael W. Mahoney
While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points.
no code implementations • 23 Aug 2017 • Peng Xu, Fred Roosta, Michael W. Mahoney
In this light, we consider the canonical problem of finite-sum minimization, provide appropriate uniform and non-uniform sub-sampling strategies to construct such Hessian approximations, and obtain optimal iteration complexity for the corresponding sub-sampled trust-region and cubic regularization methods.
no code implementations • 6 Aug 2017 • Miles E. Lopes, Shusen Wang, Michael W. Mahoney
In recent years, randomized methods for numerical linear algebra have received growing interest as a general approach to large-scale problems.
no code implementations • ICML 2017 • Di Wang, Kimon Fountoulakis, Monika Henzinger, Michael W. Mahoney, Satish Rao
As an application, we use our CRD Process to develop an improved local algorithm for graph clustering.
no code implementations • ACL 2017 • Alex Gittens, Dimitris Achlioptas, Michael W. Mahoney
An unexpected {``}side-effect{''} of such models is that their vectors often exhibit compositionality, i. e., \textit{adding}two word-vectors results in a vector that is only a small angle away from the vector of a word representing the semantic composite of the original words, e. g., {``}man{''} + {``}royal{''} = {``}king{''}.