no code implementations • 26 Feb 2018 • Sudhir B. Kylasa, Farbod Roosta-Khorasani, Michael W. Mahoney, Ananth Grama
In particular, in convex settings, we consider variants of classical Newton\textsf{'}s method in which the Hessian and/or the gradient are randomly sub-sampled.
no code implementations • ICML 2018 • Keith Levin, Farbod Roosta-Khorasani, Michael W. Mahoney, Carey E. Priebe
Many popular dimensionality reduction procedures have out-of-sample extensions, which allow a practitioner to apply a learned embedding to observations not seen in the initial training sample.
no code implementations • ICML 2018 • Russell Tsuchida, Farbod Roosta-Khorasani, Marcus Gallagher
An interesting approach to analyzing neural networks that has received renewed attention is to examine the equivalent kernel of the neural network.
no code implementations • NeurIPS 2018 • Shusen Wang, Farbod Roosta-Khorasani, Peng Xu, Michael W. Mahoney
For distributed computing environment, we consider the empirical risk minimization problem and propose a distributed and communication-efficient Newton-type optimization method.
no code implementations • 25 Aug 2017 • Peng Xu, Farbod Roosta-Khorasani, Michael W. Mahoney
While first-order optimization methods such as stochastic gradient descent (SGD) are popular in machine learning (ML), they come with well-known deficiencies, including relatively-slow convergence, sensitivity to the settings of hyper-parameters such as learning rate, stagnation at high training errors, and difficulty in escaping flat regions and saddle points.
no code implementations • NeurIPS 2017 • Kristofer E. Bouchard, Alejandro F. Bujan, Farbod Roosta-Khorasani, Shashanka Ubaru, Prabhat, Antoine M. Snijders, Jian-Hua Mao, Edward F. Chang, Michael W. Mahoney, Sharmodeep Bhattacharyya
The increasing size and complexity of scientific data could dramatically enhance discovery and prediction for basic scientific applications.
no code implementations • NeurIPS 2016 • Peng Xu, Jiyan Yang, Farbod Roosta-Khorasani, Christopher Ré, Michael W. Mahoney
As second-order methods prove to be effective in finding the minimizer to a high-precision, in this work, we propose randomized Newton-type algorithms that exploit \textit{non-uniform} sub-sampling of $\{\nabla^2 f_i(w)\}_{i=1}^{n}$, as well as inexact updates, as means to reduce the computational complexity.
no code implementations • 26 May 2016 • Xiang Cheng, Farbod Roosta-Khorasani, Stefan Palombo, Peter L. Bartlett, Michael W. Mahoney
We consider first order gradient methods for effectively optimizing a composite objective in the form of a sum of smooth and, potentially, non-smooth functions.
no code implementations • 18 Jan 2016 • Farbod Roosta-Khorasani, Michael W. Mahoney
As a remedy, for all of our algorithms, we also give global convergence results for the case of inexact updates where such linear system is solved only approximately.
no code implementations • 18 Jan 2016 • Farbod Roosta-Khorasani, Michael W. Mahoney
In such problems, sub-sampling as a way to reduce $n$ can offer great amount of computational efficiency.